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WE POSE the problem of approximating optimally the values of an unbounded operator, using 
elements which are specified solely by the traces (values) of some operator. Estimates are obtained 
for the accuracy of the optimal approximation, linear optimal algorithms are found in explicit 
form, and their structures are examined, and the optimal algorithm is shown to be unique. Some 


examples are given. 


Many problems in the processing of experimental data can be formulated as a problem in 
computing the values of an unbounded operator. In general, this problem is ill posed [1]. Often. 
the initial information contains errors, and arrives in discrete form, with the result that the value 
of the operator can only be evaluated approximately. It thus becomes extremely important to 
find the (in some sense) best or optimal operator, approximating the initial operator. 


The problem of the optimal approximation of an operator when the information is exactly 
specified was considered by Stechkin in [2] and Bakhvalov in [3]. The optimal approximation of 


a linear bounded functional when the information is specified approximately was considered by 
Marchuk and Osipenko in [4], and by Reinsch in [5]. A particular case of the approximation of 
operators was studied in [6]. Order-wise optimal linear operators were found by Morozov in [7]. 
Mention may also be made of papers by Strakhov [8] , by Ivanov and Korolyuk [9] , and by 


V. V. Ivanov [10]. 


Let us state our problem. Let H, G, F, V be normed spaces. Let L be a linear operator with 
non-empty domain of definition D; © H, mapping D; into the space G, and let A be a linear 
operator with domain of definition D4 C H, mapping D, into F. We shall assume that D=D_,f) 
D,#*®2, Finally, let B be the operator to be evaluated, with domain of definition DpC H, mapping 
Dp into V, such that D C Dp. The operators A, B, L may be unbounded; here, B(—u) = —Bu Vu 


=D. 


We are given the admissible set of elements 
M,=ueD : ||Lu\|;<R}, 0<R=const<+, 


On the basis of information about the element w=M,, characterizing the exact (or approximate) 
value of the operator A on the element: f= Au or f ~ Au, we have to compute the value of the 


operator Bu. 


The cases of both exact and approximate specification of the element f will be considered. 
Assuming the existence of supplementary a priori information about the element uw, we find an 
effective lower bound for the error of the approximation, which is independent of the 
approximating operator. In the case of a linear operator B and Hilbert spaces H, F, and G, we find 
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the approximation, best in the sense of some chosen criterion, to the operator B, both at every 
point f (or f) and in the entire set of data. In the case of approximation at a point, the optimal 
operator is shown to be unique. 


The existence of the optimal operator is proved by a functional method and is not based on 
geometric considerations as e.g., in [3]. Another important point is that the optimal operator itself 
is found during the proof of existence; this operator proves to be linear. Moreover, regardless of 
the concrete form of the operator B, the optimal operator has the following structure: T,,, = B - S, 
where S is independent of the operator B, and is fully defined by the initial data of the problem. 
This means that operators which are optimal in a class of problems can be constructed. 


1. Exactly specified information 


1. Assume that the information about the element u is specified exactly. We introduce the 
set of all data Np: 


N,=(feF : f=Au, u=M,}. 


The set Mp #¢, so that Np also is non-empty. 


Given the fixed element f=Nz. We introduce the set 


U;(f) ={ueM, : Au=f}. 


Concerning the element u, on which we want to find the value of the operator B, assume that 
it is known a priori that u=U,(f). Denote by T any operator (not necessarily linear) which is 
defined in Np and maps Np into V. The error of approximation of the operator B by means of the 
Operator 7 on the set Np will be characterized by the function 


ath mp. Sheba. 


fENp ueUp (f) 


We put 
®:(R) =sup||Bully, u=U,={ueM, : Au=0}. 


It is assumed that wp,(R) is defined (finite) for all R > 0. For this, it is sufficient that the operator 
B satisfy the B-complementarity condition [7] 


|Bully’<ys(||Aulle’+]|Zu||,”) VvueD, 0<y,=const<+o, 
We have: 


Theorem 1 


Under the above assumptions, for any admissible operator T we have the lower bound 
o2(R, T)>on( # 


Proof. Let he=Ux» be an element such that 


|Bhe\lvy=o2(R)—e, 
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where € > 0 is an arbitrary number. It is obvious that the element (—h,) Uz, and since the 
operator B is homogeneous, @s(R) <||B(—h,) |lvt+e. By the definition of w,p(R, T), we have 


ox(R, T)>max{||Bh.—TO|ly, ||Bh.+70||,}, 


where @ is the zero of the space F’. Since 


2Bh.=Bh,—TO+(Bh.+T9), 


we obtain, using the triangle inequality, 
2||Bhelly<||Bhe—TOlly+||Bhe+TOlly 


<2 max{||Bh.—TOlly, ||Bh.+7O|lv}. 


w2(R, T)>I|Bhelly>os(R)—e. 
Recalling that € is arbitrary, the theorem now follows. 


Notes. 1. If the kernel of the operator A: ’.,={u]D: Au=O}consists solely of zero, i.e., A 
is invertible, then Up = {0}. Then, obviously, w,;(R)=0 VR>O. It is therefore natural to require 


that dim N, > 1. 


2. The same device was used in the proof of Theorem | as was used for the proof of a 


similar theorem in [4] ; see also [7]. 


The lower bound (1) has been proved for a non-linear operator B. It will be shown below that 
the bound can be reached in the case of a linear operator B. 


2. We introduce the error of approximation of the operator B by an operator T at the point 


os(R,T,f)= sup ||Bu—Tjlly. 


ueU p(f) 


Consider the following problems of optimal approximation of the operator B: to find the operator 
To, optimal at a point, i.e., such that 


os (R, To, f) = inf o2(R, 7, f) =02(R, f); (2) 


and to find the operator Pp, optimal in Vp, i.e., such that 
we (R, Po) = inf w2(R, T)=o2(R, Nr). (3) 
We shall assume now that H, F, and G are Hilbert spaces. For fixed f=Np we introduce the 


set 


U (f) ={ueD : Au=f}. 
To solve problems (2) and (3), we consider an auxiliary problem: to find the element u;=D such 
that 
Lu, \|o=inf|Lulle,  weU(f). (4) 


We shall say that the operators A and L are jointly closed in D if, given any sequence of 
elements w,,<D , such that 
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limu,=u (Bf), lim Au,=f, (BF), lim Zu,=g. (BG), 


n-> oo n-> oo n-> co 


it follows that up=D, and Auo=fo, Luo=go. 


It was shown in [7] that, provided the operators be jointly closed, an element u;€ D, which 
we shall call the L-pseudo-solution, exists and is unique for every f=A[D], provided that, for any 
we=D , we have the complementarity condition 


Wlwlla?<y(l|Aulle?+llLullo?) =yllullar, O<y=const <<+o, 


A linear operator So: Sof=u, is thereby defined, with domain of definition Ds,=A[D]. 


Notice that the equation Au = fis solvable for any f€F, if Q,4= {feF : f=Au, w=D}=F. 
In this case, the operator Sp is defined in the whole of F,, and hence is bounded in D, equipped 
with the norm lleell4 7: 


Lemma | 


For allu]U (f) we have 
||Lu,—Lu||¢<||Lulla. 


Proof. From Euler’s identity for problem (4) we obtain, for all u=U(f) ; 


(Lu;—Lu, Lu;)¢=0. 
From the obvious identity 
|| Zu,—Lul|,’=||Lu)|,>—2 (Lu—Lu,, Lu,) —||Lu,|lc” 
and Eq. (6), we obtain, for any uw=U(f) , the Pythagoras equation 
[Lul|o?=[|Lu,||6°+ || Lu—Luyll,’. 


The inequality (5) follows obviously from (7). The lemma is proved. 


Note 3.1f f=Au, weN,,where the set V,={ueD: Lu=0},then it can easily be seen that we 
have the relation u;=u Vf. 


Theorem 2 


If B is a linear operator, then the same linear operator is a solution of problems (2) and (3), 
namely, 


T.=BS,, 


the absolute values of the errors being respectively 


os (R, f) =sup||Bu—BS)f lly, u=U;,(f) 


a(R, Nz) =@2(R). 
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If V is a Hilbert space, then Tp is the unique solution of problem (2). 


Proof. For the pseudo-solution uy and arbitraryu=U (f)we have 
|Z (2u,—w) |le=||Lulle. 


Hence, for anyueU, (f), the element(2u ¢ - u) also lies in Up(f). Moreover, for any operator 7, any 
u=U,(f) and any f=Nz we have the obvious inequality 


o2(R, T, f)>max{||B(2u,;—u)—Tfllv, ||Bu—Tf\l-}. (10) 


From the equation 
2B (u;—u) =B (2u,—u) —Tf+(Tf—Bu), 


which holds by virtue of the fact that the operator B is linear, we obtain with the aid of the triangle 
inequality 


2||Bu,;—Bul| y<2max{||B(2u;—u) —Tf\ly, ||Bu—T flv}. 


We then obtain from (10): 
os(R, 7, f)>sup||Bu-BSoflly, weUr(f). (11) 


Obviously, the equality is reached in (11) for F = BSp, whence it follows that Tp is the solution 
of problem (2), and Eq. (8) holds. 


If w=U,(f), then, by the properties of the pseudo-solution, we have A(u—u,;)=0. From 
(5) we obtain 
(u—u,) |lc<R. 


Consequently, the element (u—u;) =U, for any u=U,(f) and any f=N,, Hence, for any f=N, , 


®:(R)=sup ||Bully => sup ||Bu—Buylly. 


ueU p ueU p (f) 
It then follows from (1) that, for any 7, 
os(R,T)>o;(R)=>sup sup ||Bu—BS,f|ly. (12) 
jENp ucU, (f) 


The extreme terms in (12) are obviously the same for J = 7p, and hence 7> is also a solution of 
problem (3). Equation (9) follows obviously from (12). 


Now let V be a Hilbert space. We shall prove that the operator 7), optimal at a point, is 
unique. Let T be another optimal operator. We shall find the operator Ty, such that 


4 


Tf wars “2° 


1 
Tf + > BSif, fENz. 


The operator Ty is also optimal at a point, since 


sup ||Bu—T,f|l y<'/2 sup ||Bu—Tf\ly+'/2 sup ||Bu—BSf\ly=o2(R, f). 


ueU p(f) ueUp (f) ueU p (f) 


On applying the parallelogram equation and recalling that the operators 7’, BS,, 7'y,, are optimal, 
we obtain for arbitrary T the inequality 
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os’ (R, T, f)>o5'(R, f) +'/sl|BSof—Tfllv’. 


Hence it is clear that, to reach the optimal error wa,(R, f), it is necessary that 
|BS.f—Tfllv=0. 


This relation proves the uniqueness. 


Note 4. Using (8) and Note 3, we get 
os (R, f)=0 Vf: f=Au, uEN,, 


i.e., on elements of the kernel of the operator L, the algorithm Tp gives exact values for any 


admissible operator B. 


Given the linear operator Bp in Dg, acting from H into V, such that 
¢,||Bowll v<||Bull y<e2||Boully, 


where c; > 0 are constants, independent of uw=D,. We then have: 


Lemma 2. Under the conditions stated, we have the estimates 
¢,@s°(R)<@5(R) <c.@3°(R), 


where 


@s°(R)= sup ||Boully, ueU;,. 


The proof is obvious. 


Lemma 2 shows that the function wp 9(R) has the same order in R as the function w,(R), 
though the evaluation of it, when finding the order of accuracy of the approximation of the 
operator B, can sometimes prove to be far simpler. 


3. Notice that the optimal operator Tp is independent of the number R; only the errors 
ws(R, f) and w;(R, Nr) depend on R. We shall consider as a measure of the accuracy of 
approximation of the operator B the quantities 


Os (7, f)=sup {|Ba—Tfll/|Lull},  weU(f), — ||Lull*0, 


6x(7)=supds(T,f),  feA(D]. 


We shall seek the operators f and P, such that 


Ga (To, f) = inf Oa(T, f) =an(f), 


Os (P,) = inf ds (T) =Gs. 


The proof of the following theorem is similar to the proof of Theorem 2. 
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Theorem 3 


If B is a linear operator, then the same linear operator 
T.=T, =P,, 
provides a solution of problems (13) and (14); here, 


On(f) = sup {||Bu—BS,f||/|Lull},  ueU(f), — |Lull*0, 


Os = sup @3(f), f=A[D]. 


4. Let us give some examples. Let D = D; and 
Au=(A,u,..., Ant), 


where A; is a linear bounded operator, defined in D and mapping the Hilbert space H into the 
Hilbert space F;. We define the space F’ as the Cartesian product of the spaces F;: 


Faulk XF.X ...XF a 
with the norm |lfllr’=Ilfille’+ ... +lfallew f=(f,-. +, fn) SF. 


It follows from Theorem 2 that the method of operator (and also functional) interpolational 
splines (see [7] , p. 278) is optimal for evaluating the values of a linear operator B, given the 
a priori information u=Uz(f), f=Nr. 


It can be shown in a similar way that the modified method of collocation, see [11, 12], is 
also optimal. Consider some simple examples: 


Example 1. Let H=W;'[0, 1], G=L,[0, 1], F=R,, V=C[0, 1], L=d/dx, Aww(z) 
=u(x;), x=ih, i=0, 1,..., n, h=1/n, B=E. Assume that it is known that 
i 
zeEM, = {u: [w’ (x) Par<r’}. 
0 
This case corresponds to finding the method of best uniform approximation of the function u(x) 
on the set Mp. 


By Theorem 2, the best uniform approximation is the interpolational spline of the first degree 
(step-line). 


Example 2. Let H=W,?[0,1], G=L.[0,1], F=R,, V=C[0,41], L=d?/dz’, A(z) = 
u(x;), z=ih, i=0, 1,..., n, h=1/n. B=d/dz, ie.,we consider the problem of best uniform 
approximation of the first derivative of the function u(x). Assume that it is known that 


ueEM, = {u: f [w” (x) acca}. 


By Theorem 2, the best approximation of u'(x) is the function s(x), where s(x) is the cubic 
interpolational spline. 


The results described above can be extended in a natural way to the case of unsolvable 
equations Au = f. In this case, the element f has to be replaced by the element Pf, where P is the 
operator of orthogonal projection onto the set 4, if Pf=Q.(we have in mind the case when the 
closure of Q, in F is not the same as Q, ). 
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2. Approximately specified information 


1. Let us take the case when, instead of the element /, the approximation f of it is specified. 
Given fixed f=N, , we put 


No(f) ={(F EF : llf—Flle<6}, 6>0. 


We introduce the set of all approximate data: 


Nr=UNo(f), f=Nn. 


f 


For the numerical parameters 5 > 0, R > 0, we define the set (the element 7 & Np is fixed) 


Us, n(f) ={u=M,:||Au—f ll r<8}. 


Obviously, Up(f)SUs, 2(f) for any 5 >0,R > 0. We assume that it is known a priori that the 


element u, on which the value of the operator B is evaluated, belongs to the set Us, 2(f ). 


Let T be any (not necessarily linear) operator, defined on F, and mapping F into V. We 
characterize the error of approximation of the operator B by the quantity 


op (5, R, T) = sup sup | Bu — Tf |v, ucUus rif), fENr. 
f u 


We introduce the set 


Us, r={ueM, : || Aull -<6} 
and the quantity 
wo, (6, R) = sup||Bully, ucU sn, 


which we shall assume to be finite for all 6>0, O<R<+ 


Theorem 4 


Under the above assumptions, for any operator T we have the lower bound 


(6, R, T)>as(6, R). 


The proof follows the same lines as the proof of Theorem 1. 


Let us emphasize that, in Theorems | and 4, all the spaces are assumed to be normed, and 


the operator B is not necessarily linear. 


If A = E and B is linear, Theorems 1 and 4 are the same as the theorems proved in [7]. The 
following problem was stated in [7]: to find the operator T,,, from the condition 


os (6, R, T op) = inf wo (6, R, 7), (15) 
a 


when the quasi-optimal (orderwise optimal) operator is defined, i.e., the operator Ty opt for which 
the constant K,0 <K <+ ©, exists, such that 


2, (6, R, Ix opt ) <Koz (6, R, Top). 


It was shown in [7] that the “smoothing” operators, constructed in the following ways, are also 


quasi-optimal: 
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a) on the basis of a choice of regularization parameter a from the condition ||Au,— 7 ||-=6, 
where Ug is the regularizing family of elements; 


b) on the basis of the discrepancy method; 

c) on the basis of quasi-solutions; 

d) on the basis of a determinate Bayes approach. 
The operators corresponding to a) and d) are linear. 


If the criterion (15) is chosen, it is difficult to determine the optimal algorithm T opt» Since 
the set Us nA) has a complicated structure. The natural way of the difficulty is to take a different 
but closely similar criterion, such that the structure of the optimal operator is not influenced by 


the “geometry” of the set Us p (f). 


2. Assume that approximate information is given about the element w&D , in the form of an 
element f =N where N is some arbitrary set of F. For the numerical parameter \ > 0 we introduce 


the functional (see [1] ) 
®,[u, 7 ]=Al|Au—f llp’+||Lull.’. 


We take the following measures to approximate the operator B: 


a) at the point ra 


wa (A, 7, f)= sup {\|Bu—T||/®; [u, FJ}, 


b) in the entire set N, 
Op (A, 7) = sup og (A, 7, f), fEN. 
f 


We can state the following problems of optimal approximation of the operator B: to find 
the operator 7}, optimal at a point, from the condition 


On (A, 7, f) = inf w2(A, 7, f) =a (A, f); (16) 


to find the operator P), optimal in N, from the condition 


Wa (A, P,) = inf Ws (A, T)=osz(A, N). 


T 


Here the infimum is taken over all operators 7, defined in the set N. 


We shall henceforth assume that H, F, and G are Hilbert spaces. Before proceeding to the 
solution of problems (16) and (17), consider the following auxiliary regularized problem: to find 
the element u,]=D such that 


®,[u,, g]= inf ®,[u, g], geF. (18) 


ueD 


It was shown in [7] that, if the complementarity condition holds and the operators A and L 
are jointly closed, the element uw exists and is unique for any g=F. We can then define, on the 
basis of a solution of problem (18), a single-parameter family of “smoothing” operators S), such 
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that the element wu, :S,g=u,Vg&F. is associated with the element g. We shall call the element 
uy the regularized pseudo-solution. For the case D= H, and a bounded operator A, the operator 


S can be written explicitly as 


Si=A(AA*A+L'L)—A*. 


The operator S) is obviously linear and bounded in F. 


Theorem § 


If the operator B is linear, then, given any ) > 0, a solution of problem (16) is provided by 


the linear operator 


T,=BS,. 


If \ is independent of f, the operator 7} is also a solution of problem (17). Here, 


@a (A, f)=sup {||Bu—Bi,||vy/®,"[u, 7 J}, (19) 


wp (A, N) = sup op (A, f), fEN, (20) 
f 


where &,=S)/. 


Proof. \t can easily be shown that, for the regularized pseudo-solution uy and any AED, we 
have 
®,[2%,—h, f]=,[h, f]. 

Further, given any operator T and any hED , we have 

ws (A, 7, f)> max {||B(2%,—h) — TF ||/O,"[20,—h, fF], 

||Bh—Tf ||/@,"[h, #1} =max {||B (2%,—h) —TF ||/O,"[h, F], 

|Bh—Tf\|/®,"[h, FJ}. 
Since the operator B is linear, we have 

2B (ii,—h) =B (2%,—h) —Tf+(Tf-Bh), 

whence, applying the triangle inequality, 


2||Ba,—Bh||y-<2max{||B(2%,—h) -Tf lly, || 7 7—Bhlly}. 
Hence we obtain from (21), for any f = and any 7, 

ws (A, 7, f)>sup{||Bu—BS,f ||/O,"[u, f]}, ue=D, (22) 
and the sign of equality holds for T= BS). This in fact implies that the operator 7} is optimal at a 
point. 


Now let \ be independent of the choice of the element f, We then take the supremum in both 
sides of the inequality (22) with respect to 7 =. We get 


ros hen {| Bu — TF | /®," [u, 7} 
> sup sup {| Bu — BSF | /D,"*[u, F]}. 
fEN USD 


sup su 
(23) 
¥ 


Since the operator BS) is independent of u and 7, we can substitute T= BS) on the left side of (2). 
The equality is then obtained in (23). Consequently, P, = T). 





Optimal approximation of operators 
Equations (19) and (20) follow from (22) and (23) respectively. The theorem is proved. 


It is easily shown that, when the B-complementarity condition holds, wp(, Nis finite for 
any A> 0. 


3. The optimality criteria (16) and (17) have a certain universality. First, whatever the choice 
of the parameter A, the method based on regularization proves to the optimal in the sense of the 
criterion. It is natural to try to find an optimal operator which also ensures stable evaluation of the 
values of Bu. Such an operator can only be found when supplementary a priori information is 
available. A second feature of the criterion is that supplementary a priori information (about the 
set N) does not influence the structure of the optimal operator, and can only affect the choice of 
the parameter X. In this way, the construction of optimal operators for different initial data can be 
reduced to a suitable choice of the parameter A. 


Notice that, in the case of optimization at a point, it is admissible for the parameter A to 
be dependent on the function f, For instance, if the number 5 > 0 is given, such that || Au—f || -<6, 
then a parameter A=A(6, f) can be chosen from the condition ||A%,— /||-=6. The resulting 
operator 7x0, 7) will be optimal in the sense of the criterion (16). 


In the case of optimization in the set N, the parameter \ has to be independent of the choice 
of f, and has to be defined for the entire set N. Let N= Nr ,ie., the characteristics of N are the 
numbers 6 > 0 and R > 0. An example of a choice of \ dependent on the entire set Nis 1=A= 
R?/&? (which leads to the so-called determinate Bayes method of regularization [7] ). The operator 

>Z is optimal in Nr in the sense of the criterion (17). 


As we remarked earlier, it was shown in [7] that the operators 7y and 70,7) are quasi- 
optimal with respect to the criterion (15), in fact, 


sup sup | Bu — BS yf ly < 2"wp (4, R), A= d,4(6, f). 


fENRYMSYUS RIF) 


In [7] the conditions under which w,(5, R)-0 as6—0-.were stated. Obviously, under these 
conditions the optimal operators 7'x;3,7) and 7; ensure stable computation of the values of the 


operator B. 
Some of the present results were published in [13]. 


Translated by D. E. Brown. 
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ON AN ITERATIVE PROJECTION ALGORITHM FOR SOLVING 
ILL POSED PROBLEMS WITH AN APPROXIMATELY 
SPECIFIED OPERATOR* 


V.P. TANANA 
Sverdlovsk 
(Received 11 February 1975; revised 21 April 1975) 


A PROJECTION algorithm of the iterative type is proposed, for solving approximately linear 
operator equations of the Ist kind with an approximately specified right-hand side and an operator 


in Hilbert space. 


An original method for solving an operator equation of the Ist kind with a disturbed operator, 
representing an extension of the discrepancy method [3—5], was described in [1], in the context 
of compact embedding [2]. A similar though somewhat different approach to the solution of 
ill posed problems with a disturbed operator was considered in [6]. 


The basic idea of the method described in [1] lies in reducing the problem of the approximate 
solution of the operator equation to a variational problem with non-linear (non-convex) constraints; 
but the solution of this latter problem is quite difficult and requires the development of special 


methods. 


In the present paper the method described in [1] is justified for linear operator equations 
without compact embedding; this is extremely important when solving the converse problem of 
gamma logging of wells [7], in which the exact solution (radioactive element content) is often a 
discontinuous function, about which no a prion information is available. Further, it is shown, under 
the same assumptions, that the method described in [1] can be reduced to a method of Tikhonov 
regularization [8], with a parameter a chosen according to a generalized discrepancy principle 
[9, 10] ; and finally, an iterative projection algorithm is outlined and proved for realizing this 


method. 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 15—23, 1977. 





Algorithm for solving ill posed problems 
1. Statement of the problem and method of solution 


Let X be an E space (see [11]), Y a Banach space, and A a linear one-to-one continuous 
operator mapping X into Y. We consider the operator equation of the Ist kind 


Ar=y, zexX, yey. (1.1) 
Assume that, for y = yg, the equation has a solution xg, but that yg and A are unknown to us; we 


only know the quantity ys such that|!7/;— y, || <6,and the linear continuous operator A,;, mappin 
y q Y V§ Pp h pping 


X into Y and satisfying the condition ||A,—A ||<h, where 5 and h are positive numerical 


parameters, and || ys|/>8+||rol|hk, h<|!A,l. Knowing ys and A, we want to construct the 
approximate solution xg, of Eq. (1.1), satisfying the condition 2,2, as 6+h-0. 


The method of solution amounts to reducing the problem of the approximate solution of 
Eq. (1.1) to the variational problem 


inf {{]a||"||A,r2—ysl|<6+]lz|[h}, v4, 
see [1]. 
Theorem | 


Let the domain of values of the operator A be everywhere dense in Y, R4, = Y. 
Then the variational problem (1.2) is equivalent to the problem 


inf {[|z|]}"|||A,~—ys||<6+th} (1.3) 


with the connection 
ll vsn°|| =. (1.4) 


Proof. Let 9S tS (|lys||—5)/h. Consider the functiong (t) =t—||Zn°||,where X5;," is 
the solution of the problem (1.3). The function ¢(7) is obviously monotonically increasing, and 
satisfies the end conditions @(0)<0 and @((\lysl|—5)/k) >0. Hence a unique 79 exists, at 
which $(r9) = 0. Problem (1.3), (1.4) is thus uniquely solvable. We denote the solution of the 
problem (1.3), (1.4) by xs,,79 and we aim to show that the element x5, 70 is the unique solution 
of problem (1.2). For this, we observe that x5 ,79 satisfies the constraints in the problem (1.2); 


Assume that x . is not a solution of problem (1.2). There will then be an element =X 
such that || A,2—yo|| <6+||z || and||z ||<||z5,"°|| —d, where d > 0. The element X will then satisfy 
the inequality || A,%—ys||<6+||z»,"||h, and noting that 


= 


lassi] = inf {Ilall|||A,c—yoll <S+ lass ll2}, 


xeX 
we obtain || ||>||zs,"°l|, but this contradicts the inequality || ||<||zs,""|l. 
To prove the uniqueness, assume that x is another solution of problem (1.2). This solution 


will satisfy the conditions ||%||=||z0."7il and ||4,%—ys|| <5+||z0,"°||h, but X is strictly convex, 
so that, on the basis of [12], we have X=z»,"°. This proves the theorem. 
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It follows from Theorem | that, under our assumptions, problem (1.2) is uniquely 
solvable. Recalling the results of [13], we can conclude from Theorem | that the variational 
problem (1.2) is equivalent to the problem 


inf {|| A,r2—yo||"*+ellz||"} 


xeX 
with the connection 
A n2ton%*—yoll =b+|lza.llh, Y= 1. 


Henceforth, the approximate solution of Eq. (1.1) (the solution of the variational problem (1.2)) 
will be denoted by x5,. 


Theorem 2 
The element x5, is convergent to xp as6 +h > 0. 
Proof. Assume the contrary, i.e., a sequence 75,h, exists, such that 6,+h,—Qas ko and 


Ils, n, —Voll =d>0. (1.5) 


Since ls, 2, 11S |zo|| for any k, and the space _X is reflexive, the sequence {Xo,n,} must be 


Ww. 
weakly compact. It can therefore be assumed without loss of generality that ts, ,—>Z as k>™. 


On the other hand, | Asn, —Yo, |S2 (6,4 |]xol|2,). Hence Azs,,,—>Yo ask>o~, 
Recalling that the operator A is linear and continuous, we get x = Xp. In view of the fact 


that ||2s,», l|Sl|2o|! for any k, and the fact that X is an E space, we have 2,1, —> 2) as ko ; 
but this contradicts (1.5). 


2. Method of finite-dimensional approximations of the 
approximate solution x5 ;, 


We consider the increasing chain of finite-dimensional subspaces of the space X 


such that 


and the variational problem 
inf {|la|]*|2eX,, || A,rz—yol|l <6+||z]h}. 
Theorem 3 
Problem (2.2) is uniquely solvable for sufficiently large n. 


Proof. Let us first show that the set {c=X,,| ||A,2— ys||<6+|lz||h} is not empty. For this, 
we note that, in view of condition (2.1), there exist Xy and xv’@Xvy such that ||x’—zxo||<6—|lys—yoll, 





Algorithm for solving ill posed problems 15 


but then, ||A,2’—ys||<6+||x’||h and hence, for all n > N, the sets {v= X,,| || A,z—yo||<6+||x||h} 
are not empty; and by Theorem 1, form 2 WN problem (2.2) is uniquely solvable. This proves the 


theorem. 


Henceforth the solution of problem (2.2) will be denoted by X5;,"- 


Theorem 4 
The solutions xg," converge to xg, as n > ©. 


Proof. Assume that the convergence x 3,"-*2s., does not hold, i.e., a sequence + 
exists such that 
llton"*—2s,l| >d>0. (2.3) 
Since the sequence Zn"* is bounded, it is weakly compact, and we can thus assume without 
loss of generality that 
em Ww. 


Ley ee as ko, 


Since the operator A,, is linear, given any € > 0 we can find an alement Xs, such that |! %5,| <|| 
Lallte and ||Ar%ean—yoll<S+||zallh. 


Recalling the continuity of the operator A; and the property of the system of subspaces X,,, 
we can assert the existence of a sequence {7,,}, 2n,=]Xn,, such that |vn,l]=[]Tonll, ny > Fon 
asn,—> xand ||Arta, —Yo!l <6+|!z,||2. Then, || An2n, —Yoll <O+|lz,,||h,and hence |/vs.""ll< 


lng |i. Consequently, lim llzan ||<Ilaen|| +e, and since e is arbitrary, we have 


Nn, “> 00 
k 


lim ||xen ||<l|zeall. (2.4) 


> co 


rh 


Since || A,25."*—ys|| <5+||zo.™||z for any k, and the operator A, is linear and continuous, we have 
Ant—ysllSb+| lv" 2 Ving. (2.5) 


From (2.4) and (2.5) we obtain 
| Anz—yol| <5+|lraallh, (2.6) 


and recalling that ||z,||= inf {||z|||||A,2—yoll<5+||z»,l]2}, we find from (2.4) and (2.6) that 
xex 
£=2p,. Recalling that XY is an F space, we find from this last equation and (2.4) that 


Boa" Tan 88 Ny Oo, 


which contradicts (2.3). This proves the theorem. 


3. An iterative method of solving problem (2.2) for:y, = 2 


Let X = Y =H, where H is a separable Hilbert space, and let A; be a linear one-to-one 
continuous operator, mapping X into Y, with domain of values R 4;,, everywhere dense in Y; the 
subspace X,, satisfies the condition p (ys, AnXn) <6. 
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lal Let x, be the k-th iteration, which satisfies the conditions 7,]=X, and ||A,2.—yol| =d5+ 
1 ea h. 

Then we obtain the (k + 1)-th iteration x, by solving the problem 


inf {[lcl]?|reL (an, Ann’ (Anta—Yys)); []Ana—yoll=6+llallh} (3.1) 


Here, L(%x, Ann’ (Anta—Ys)) is the linear hull, stretched over the elements xz and Ann’ (Anta—Yo), 
and A’,,, is the operator adjoint to the operator A;,,,, which is the contraction of the operator A, 
from the space X onto X,. 


Lemma | 
If 0 (Ys, A,X n) <6 and A. —yoll=llzl|h, then Ann’ (Apx — Ys) +(), 


Proof. Under the conditions of the lemma, 


A,t—ys=(Ané—pr (AnXn, Ys) ) + (pr (AnXn, Ys) Yo), (3.2) 


where pr (A;,X,, Ys) is the metric projection of the element ys onto the subspace A,X, and 
A, —pr (ArXn, ys) #0. 


By definition of the adjoint operator, for any z=X., we have 
(Ann’ (A 1k —Yo) ‘ z) = (A Hs, A,x) A 
Hence, recalling the decomposition (3.2), we obtain 


(Ann (A,X —ys) : x) come (A,X —pr (AnXn, Ys), A)x) ; 


From this equation we have 
(Ann’ (Ant —ys), ¥) #0, 


where =X—A,~' (pr (AnXn, Ys) ) HenceA nn’ (A,X —ys) #0; this proves the lemma. 


Since the hyperplane {y| (A,%—yo, y)=(An%—ys, AnX)} supports the sphere So4j2y 
(ys) ={y | lly—yoll <5+ |Z ||h} at the point A,x, the corresponding hyperplane 


G={reX,,| (Ann! (Ant —yo), ©) =(Ann’ (Ant—ys), F)} 


will support the set 23,."= {v=X,,| || A,7—yo|| <6+||Z||h} at the point x. 


Let x, be a point satisfying the condition z,]X,,,||A,2.—Yoll =6+||zx||h;then the hyperplane 
supporting the set {v= X,| ||A,z—ys||<6+|lzn4;||h} at the (k + 1)-th iteration xz4, (see (3.1)) 
will either separate the set {z=]=L(2,, Ann’ (An%a—Yo) ) | | Ant—Yoll <5+]lz,||h} and the 
point 0, or will contain the point 0. 


Lemma 2 


If x, is the k-th iteration and 2,2", while x;4, is the (k + 1)-th iteration, see (3.1), then 
tn+sll<llzall: 
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Proof. Recalling the remark made above, it can be assumed without loss of generality that 
Ann’ (Antx—Yo) FAX» for any values of X, since otherwise 2,41=Zs, and hence ||z,||?== inf {|lz\|*| 
rEX,, ||Anx—yo||<6+|lz2\||h} ,which contradicts the hypothesis of the Lemma 2. 


Obviously, the projection pr (G;, 0) of the point 0 onto the hyperplane G; will satisfy the 
conditions: 
pr (G,, 0) <inf {I|zl| |z=X,, || A,v—yoll<5+|laa ||P}, 
pr (G:, 0) =NoAnn (Ap2,—Ys) ’ 


where Ag is a number, and G, is the hyperplane supporting the set {z= X,, | || Anz—yo|| <6+||z,||h} 
at the point x;. 


We choose a number € # 0 such that the hyperplane G; does not separate the point 
pr (Ga, 0) +eAnn’ (An%a2—Yo) and the set {r=X,,| |] A,2—yos||<||An2,—yo]|} and such that 
lpr (Ga, 0) +e Ann’ (Antax—Ys) ||<lla4l], and we consider the interval 7={ax,+ (4—a) [pr 
(Gy, 0) te Ann’ (Antr—yo) ], O<Sa<1}, containing the points (Gi, 0)+eAnn’(Anva—Ys) and 
Xk: 


Then, since the boundary of the set {r=X,,| || A,2—yo||<||Ana,— Yoll} is smooth, there will be 
a point X»>—=G or, + (1—ao) [pr (Gr, 0) +eAnn’ (Ante—ys) ], O< <4, such that || A,2o— 
Yo||<||Anve2—Yol|. Hence there exists a point ¥=L (ax, Ann’ (Anta—Yo)),  ||Xoll=llaall, and 
A. ¥o—Yoll<||Anax—ysll, where ||Antx—yoll =6+|lz4||2. But then, we can choose a number 
Mo, |Mo|<4 such that |]An(p0%o) —yoll<5+||Mo¥ol|h, whence it follows that ||a.+1\|<llall. 
This proves the lemma. 


Lemma 3 


Let the sequences {2x}, {xx’}<X, be such that L (xq, 2x’) NQ5,,#~ 2%, where Qn, ={zEXn| 
|Ant—Yoll<6+axh}, anss<an, A.A) ask>, and X,>2, 2/2’ as ko, while r#A2” 
for any values of A. Then, 

lim ||7,\| <I, 


k-> oo 


where ||Z,||?=inf {Ilz||?>|2zeL (aa, 24’) NQn,}, 1% ||’ —=ink {Ilx||?]eeL(z, x’) NQF, }. 


Proof. Assume the contrary, i.e., 
lim ||%4||>II7 ll. 


hoo 


Then, a subsequence {%,, } exists, such that 


|Z, > lz ld, (3.3) 


where d is a positive number. Assume that 0O@G (x), where G(x) <X, is the hyperplane 
supporting the set Q, at the point x , which, since the boundary of the set (25, is smooth, is the 
tangent hyperplane to the set 2, at the point x, while in view of the fact that OEG (7) ,ie., the 
intersection L(x, x’) NQSh, consists of more than one point, we have L (xx, 2,’ ) Ne, AD, 

for sufficiently large k. Denote by x;,° the point satisfying the relation 


Ze? = inf {I]a|]?]reL (xp, 24) N Qor,} 


Since Qs, Sos we have ||%x||<||,° ||. 
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Consider the metric projection x; of the element x onto the set L(2;,, Xn, ) .We can assume 
without loss of generality that Xi, —% as ko, where Y=L(z, x’). 


We then consider the sequence of points {z, }<G (x), where z,, = Ma, th), —x, is a number. 
Obviously, 
» © ll=112, —2ll/cos Xn» (3.4) 
where OK) is the angle between the vectors Ly, —X and Zn, —Z, 
(TZ, — 2%, 2%, — 2) 
|x, — Z|] | 2%, — 2] 





cos Qie) = 


Since OEG(X), we have sup {(—%, x—X) |w=G (xX), |lz—X||=1}<|lz||, and hence the angle 
2(G(x), X) between the hyperplane G(x) and the element X is positive. We can assume without 
loss of generality that o%,, ~a% and|a|< 2/2—2(G(x), xX). Hence, for sufficiently large 
k;, we have |a,, |</2—a,, where a; >0. 


From this and (3.4) we have 
Zr, —>ZX as k,>o~, (3.5) 


We consider the sequence {Z, ,}, where Z;, =vi ¥,,, Ves a number and |AiZ%, —yoll=ll 
A,X —Ys||.Since the hyperplane G(x) is tangential to the set Q%, at the point x, we have 
|Z, —Zx, || +0 as ki o&, 


It follows from this and (3.4) that 


Zn, IIZll as | ki. (3.6) 


Since zp, EL(z,, x, ) q Sher a | Zz, |? = inf {Ix Pleel(z, x, ) al QShets then IIx,‘ I< 
|Z, ll, and hence |%,, ||<[l2,, ll, and in the light of (3.6) for sufficiently large k, we have 
IZ», || S||%||+e/2, which contradicts (3.3). 


Assume that 0]G(zx), then, 
L (a, x’) NQsn.= {x}, 


since otherwise, || Z|? ++ inf {2 |? |a GL (a, x’) FQ] QS:,}. Noting that 24,2, 2,2’, and 
that the operator A;, is continuous, we obtain A;,7,, -> A,x and An® i, > A,x’. Since Ay —> Ap as 
kk, —> co, the sequence of sets S§a,n (Ys) is B-convergent to the set S34ay. (Ys) (see [14]), and 
hence the sequence of sets L (Aj,2%,, An Xx,)[) Sbsa,h (ys) is B-convergent to the set L (Ap,x, Apx’) 


(1 Ssan (Ys), 1-e., 
sup {|ly—A,z | lyeL (Anzn,, A)2x') NSo40, (ys) }+0 (3.7) 
as k; > &, where Soi, (ys) = {y=AXi' |ly—Yo|| <5 +a, h}. Since the operator A;,~ 1 is continuous 


on A;,X,,, the sequence of sets Ain (L (Ant, Ante’ ) NSs4a,.r (yo) ) will, in the light of (3.6) and 
(3.7), be a-convergent to the element X (see [14] ). But then, on the basis of the results of [14], and 


the fact that 
lx, I’ =int(jall? ee Any (L (Arce, , Anta’) MSo40, (Ya) )}, 


we obtain X,,—>2 as k,~ ©, which contradicts (3.3). This proves the lemma. 





Algorithm for solving ill posed problems 


Theorem 5 
The successive iterations x, (see (3.1)) are convergent to xg,” as k > °°, 


Proof, Since |;2x+1||<||zx\l,we have 


lz,|| a as k-oo and ||z,||2a, 


where a is a number. 


Assume that a>||»,”||. Since the sequence {x;} is bounded and belongs to the 


finite-dimensional space X,,, it is compact, and hence we can extract from it a convergent 


subsequence. Let 


Tr, —>X as ki, (3.9) 


Then, |lx ll=a. Since the hyperplane G;,, supporting the set Qsne, at the point Xx), either 
separates the set {reX,,| |An—YyollS5+]]zx, |[h; and the point 0, or contains the point 0, then 
the hyperplane supporting the set 22, Ty has {vEX,,| ||A,z—yo|| <6+||% ||k} at the point ¥ will 
also satisfy this condition. 


Hence Ann’ (A,®—Yys) AAX for any value of A. Hence, by Lemma 3, 


WF il?<[z I, 


where ||#||?=inf {[lzl|?| eZ (E, Ann’ (Ant—Ys) ) NQE, yz)}-Since the operator A" pn is 


continuous, we have 


Ann’ (AnX,,—Yo) > Ann’ (Ant —Yo) as k>&, (3.11) 


while on the basis of Lemma 1 we have A;,,,’ (A,X —ys) #0. But then, by Lemma 3, it follows 


2 


a n nips 
rel (2p Ann (Anz, , 


from (3.9) and (3.11) that lim ||z, Sz, where xe I? = inf {||z 
’ 


kh) 


—Ys)) NQory }. If 2X ,=Tn,,, and (3.10) is satisfied, we have lk, ll < a for sufficiently large k, 
which contradicts (3.8). Hence ||z,||—> ||z5,"|| ask > 0, and hence 2,—2,". This proves the 


theorem. 


Translated by D. E. Brown 
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A NEW TRUNCATION PROCEDURE IN THE BAZLEY—FOX METHOD* 
L. T. POZNYAK 


Leningrad 


(Received 4 May 1975) 


WHEN the Bazley—Fox method is used in the standard form, complicated transcendental equations 
have to be solved in order to obtain a lower bound for the eigenvalues of a self-adjoint positive 
definite operator A with a discrete spectrum. While, to avoid this difficulty, Bazley and Fox 
supplemented their method with several devices, it turned out that the devices do not provide good 
convergence in certain important classes of problem. The present paper offers a new means of 
simplifying the approximate equations, to which the Bazley—Fox method leads. It reduces finding 
lower bounds for the eigenvalues of the operator A to a problem of linear algebra, and has good 


velocity characteristics. 


1. Introduction 


Suppose we are given in separable Hilbert space H with scalar product (.,.) and norm |+|, the 
self-adjoint positive definite operator (pdo) A with discrete spectrum; we pose the problem of finding 
the eigenvalue {A;} of A. As usual, we assume that the eigenvalues are arranged in increasing order, 
allowing for their multiplicity, and that the corresponding eigenelements {u;} are orthonormalized 
in the energy space Hy of the operator A: (ui, Uj) a= 5, i, j=1, 2,.. . (5;; is the Kronecker 
delta). We make similar stipulations about the eigenvalues and eigenelements of any pdo with 
discrete spectrum that may be encountered below. 





*Zh. vy chisl. Mat. mat. Fiz., 17, 1, 24—41, 1977. 





A new truncation procedure in the Bazley—Fox method 21 

Assume that there is a self-adjoint pdo A, in H with known eigenvalues {A,"} and eigenelements 
{u;°} which is semi-similar to the operator A (i.e., the energy spaces Hy and H,4 , consist of the same 
elements, see [1] ), and is connected with it by the relation 


(u, V) A » UV) I Su, Sv { YU, UV IT 4, (1.1) 
where S is an operator from H into a Hilbert space H, with scalar product (.,), and norm ‘ly: 


Our problem can then be solved by the Bazley—Fox method (see e.g., [2] ), which 
approximates the required eigenvalues from below. For this, we have to choose in H, a sequence 
of finite-dimensional subspaces {W ,.} asymptotically dense in Hy, and after arbitrarily fixing n, 
we have to evaluate the eigenvalues { } of the pdo A,,, generated in H by the closed bilinear form 
(u, v) a+ (OnSu, Sv),, where O,, is the orthogonal projector in H, onto W,,. Notice that, to 
evaluate the {A:"} we do not need a knowledge of the explicit form of the operator A,,; the 
eigenvalues{A:"}are fully defined by the above bilinear form, with the aid of which the eigenvalue 
problem for the operator A, can be written as the identity 


(u, Vv) a+ (O,Su, Sv),=A(u, v), OAuEH,, VveEeA,.. (1.2) 


On varying n, we obtain for each eigenvalue \; a sequence A;", n=1, 2,..., convergent 
from below to ;. The rate of this convergence is estimated in [3]. It was shown there that, for 


sufficiently large values of n, 


heads, Y" 11,0.) Sir, ash,?, (1.3) 


where /, is the identity operator in H,, ~ +1 is the multiplicity of the eigenvalue A, s is the least 
number for which A, = A;, and C; is a positive constant, independent of n. 


If we restrict ourselves to the assumptions made above, it becomes necessary to solve 
complicated transcendental equations in order to determine the eigenvalues A;", i=1, 2,..., The 
computations can be simplified by imposing extra conditions on the subspaces W,,. The condition 
was originally pointed out by Bazley and Fox [4], and can be stated as 


whatever the m=1, 2,..., there existsanumber MJ(n), 


? 


such that Su,;’.W for i>N(n). (1.4) 


In this case, the eigenvalues of the problem (1.2) can be found by solving an algebraic eigenvalue 


problem. 


Experience shows that condition (1.4) is extremely rigid, and not many types of problem have 
as yet been found in which it is satisfied. Moreover, our studies of convergence for some of these 
problems (see e.g., [5] ) have revealed that the rate of convergence of A,” to \; may be extremely 


slow. 


In addition to the method considered, involving a special choice of test spaces, Bazley and Fox 
[4, 6] found another means for overcoming the difficulties that arise when solving the intermediate 
problem (1.2). Their new idea was to solve problem (1.2) itself approximately, while still retaining 
the main aim of obtaining lower bounds for the eigenvalues \;. While realization of this idea demands 
certain restrictions, these are much less rigid than condition (1.4). The restrictions are as follows: 
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1) D(S*) =H,, 2) W,<D(S*), n=1, 2,...,where S* is the operator adjoint to S, and the 

bar denotes the closure operation. Restriction 1) follows from restriction 2) and the condition made 
at the start, that the sequence {W,,} be asymptotically dense in H,. The need for condition 2) 

is clear from the type of problem which Bazley and Fox proposed to solve instead of (1.2): 


divas (0) + Y¥ (Rj—Anas) (ts 2") (0,0) 


j=1 


(1.5) 


+(S°O,Su,v)=A(u,v), O*uEH, Vved. 
Here, v;°=(Ai°) “u;°, i=1, 2,...,are the eigenelements of the operator Ag, orthonormalized in 
H. 


Problem (1.5) is obtained from problem (1.2) by replacing the bilinear form (u, v)4 9 by the 
bilinear form, bounded in H, 


DF Ost —Ames) (U2, 24°) (v9, 0) Hdmas (th 2), (1.6) 
jad 
and the form (O,Su, Sv), by the form (S*O,,Su, v) , bounded in H. The second replacement 
is in a sense the identity transformation. For, by condition 2), (O,Su, Sv),=(S*O,Su, v) 
for all u, v=D(S), so that the form (S*O,Su, v) is an extension of the form (O,Su, Sv). It is 
clear that this single replacement does not change problem (1.2). The essence of the Bazley—Fox 
device lies in the first replacement. As a result of it, a problem is obtained, the eigenvalues of which 
are lower bounds as before for the eigenvalues A;, but they can be computed by solving an algebraic 
eigenvalue problem. The first of the above-mentioned properties of problem (1.5) follows from 
the fact that the form (1.6) is less than the form (u, v)4 9. To the form (1.6) there corresponds in 


H asymmetric bounded pdo Ag”): 
“orem ) : ee hg 0) 7, 044° 
(= (A; —)mai) (u, v; )v; +Am+1U, 


j=1 


which is called the m-th order truncation of the operator Ap. 


This name originates from the method of obtaining the operator Ay(™): in the spectral 


decomposition 


Aw = 8 A;° (u, v;°) vj? 
j=t 


of the operator Ag in the space H, the eigenvalues Ta Sis ... have to be replaced by the same 
eigenvalue \°,,,,. The form (1.6), which can now be written as (A‘™)u, v), is called the m-th 
order truncation of the form (u, v)4 9, while the method of replacing problem (1.2) by problem 
(1.5) is known as the truncation method. 


The condition 2) enables us to write problem (1.5) in the operator form in the space H: 


A,” utS'0,Su =u. 


It can easily be seen, by analyzing the structure of the operator Ay”’+S'O,S, that determination 
of its eigenvalues amounts to a problem of linear algebra. 


While application of the truncation method rarely presents serious difficulties, it does not 
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always give good results. Of course the poor results stem from the slow convergence of the method; 
but it is hard to say what causes this slow convergence, since we know so little as yet about the 
rate of convergence of the eigenvalues A,” of the problem (1.5) to the exact eigenvalues j. 


In [6], Bazley and Fox obtained the estimate 
hi™—hi"™<C2 (1) (Amat) 7. (1.7) 


They made the assumption that A = Ag + B, where B is a symmetric positive operator, and they 


took relation (1.1) in the form 
(u, v) a=(U, V) aot (U, V) 2, 


so that H, = Hg, while S is equal to the identity operator considered as an operator from H into 
Hp. Bazley and Fox did not investigate the dependence of the constant C(n) on n. Weinelt 
attempted to explain it in [7]. Making the additional assumption that W,, is the same as the linear 
hull of the elements u,°,... , W,’,and that the operator B is positive definite and comparable in 


force with the operator A,®, 0<8<1, i-e., 
|Bu|<C;|A,’u| VueED(A,*), C;=const>0, 


Weinelt obtained for C,(n) the estimate 
(1.8) 


with a constant C4 which is independent of n. This is all that is known about the convergence of 
Aj” to A,” for fixed n. 


An idea of the convergence of A; to A; can be gained by combining (1.7), (1.8) with (1.3). 
For instance, if B is bounded (G = 0), it follows from (1.3), (1.7), and (1.8) that the double 
sequence A:"”, n, m=1, 2,..., is convergent to ;, at a rate not less than 


Cy (Meas) +0, py | (E—O,) ts45l 5%, 


j=0 


where F is the identity operator in H. Unfortunately, this case is not typical in practice. And in the 
case of an unbounded operator B, the expressions in question give no satisfactory estimate of the 
rate of convergence of ),””” to A;. If the asymptotic behaviour n®, o > 0, of the eigenvalues {1.,°}, 
is known, then (1.3), (1.7), and (1.8) can be used to extract from the double sequence A;"”, 7, 
m=1, 2,..., the ordinary sequences Apia) , n=1, 2,..., and to estimate their rate of 


convergence to A;; the estimates thereby obtained have a low order. 


Recall that everything just said about the convergence of \,”” to \; only holds under the 
above special assumptions about the operators A, Ag and the test subspaces W,,. In the general 
situation considered at the start of the section, nothing is known about the convergence of 


nm 
dj to je 


This present state of affairs in the Bazley—Fox method compels us to look for new ways of 
realizing the basic idea of the method, concerning approximate solution of the intermediate 


problem (1.2). 


A new way of simplifying problem (1.2) is described in the present paper. It differs from the 
method of truncations mainly in the fact that we replace the bilinear form (u, v) in the identity 
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(1.2) by a larger bilinear form, whereas Bazley and Fox replaced the form (u, v)4 9 in it by a smaller 
bilinear form. The problem that then arises again has the “intermediate” property: its eigenvalues 
give lower bounds for the eigenvalues A;. We shall show that the determination of the eigenvalues 

of the new problem reduces to a problem of linear algebra. Finally, the feature of principal 
importance is that, under the general assumptions made below, an estimate can be obtained for the 
rate of convergence of the new approximate eigenvalues to the exact eigenvalues. The efficiency 

of the estimate is illustrated by an example of a Neumann problem for a two-dimensional second- 


order elliptic equation. 


2. A new truncation procedure in the Bazley—Fox method 


Turning to a detailed treatment of the new approximate method for solving problem (1.2), 
we first observe that it can be used under the same general assumptions as the Bazley—Fox method 


itself. 


The new method amounts to replacing the bilinear form (u, v) in the identity (1.2) by the 
symmetric bilinear form 


DEE) mas) “2D (a, 238) 2 (04, V) aut Chinas) “(0 D) a (2.1) 


j=i 


The approximate problem which we propose to solve instead of problem (1.2) then has the form 


(1, V) 4+ (OSU, Sv) =A{ (Amos)! (¥, V) a, 
+ py [ (5°) ~*— (Aviat) “*] (w, 1°) 4, (w°, v) ad}, 


0O*ueH,,, Vue a,. 


It is easily shown that the form (u, v) is less than the form (2.1): 


(a, 0) = D1 a2) (aw) al? YY Oat)“ (8) al? 


, 0 a 5 
Anse) >. | (uw, u,°) |" 


j=mti1 


=) E42)" = Canas) “111 (a, 15°) aol ngs) La 


The form (2.1) is obviously positive definite in H4 9, and corresponds in this space to the 
symmetric bounded pdo (A,~') ‘”: 


(Ay!) ™u = » [ (As?) ~*— (Amat) 2] (12, 145°) ct, (Armas) 0 (2.4) 


j=1 
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We aim to emphasize, by the notation (A,~')‘”? that this operator is formally obtained from the 
operator A, ! by means of the same procedure as the operator Ay") is obtained from the 


operator Ap. In fact, for i we write the spectral representation 


co 
A,~*u -\) (A;°) 1 (w, 03°) 43° 
jon 


in the space Hy ,, and then we replace all the eigenvalues of Ag”! with numbersm +1,m+2,..., 
in this representation by the (m + 1)-th eigenvalue (A°,,,)~*. Hence it is natural to call (A,~*) °” 
the m-th order truncation of the operator Ag ~ | and call the present method the method of inverse 


operator truncations. 


Let us return to problem (2.2). We shall show that it has eigenvalues representing lower bounds 
of the eigenvalues of the problem (1.2), and we shall give the method of calculating them. 


Considering S as an operator from Hy , into H;, we introduce the adjoint operator S’, acting 
from H, into H4,: 


(Su, w),=(u, S’w) a, VueH,,, Vwel,. 


It was shown in [3] that the operator S” is bounded. Using this operator, we write (2.2) as an 


operator problem in H4,: 


(1+S’0,S)u=A(A,~*) (™u, 0*uEH,,, 


where / is the identity operator in H4,,. We can write the last equation out more fully as 


d(n) 


t 
ut a (u, S’W;) a,ipS’ Wj=A (Nuns) 7400 
ijt 


+ 2 [ (Ay?) *— (Amar) ~!] (uz, us?) atts} , 


where W,,..., Wan) is the basisin W,,, (a:;), i, j=1, 2,...,d(m), is the inverse matrix to the 


matrix 


yal a'y OO). (2.7) 


Symmetric pdo’s in H49 appear in both sides of Eq. (2.5). It is clear from (2.6) that the subspace 
V, stretched over w,°,..., Um’, S’W;,..., 8’ Wain), reduces these operators. On solving problem 
(2.5) in the subspace, we obtain a finite number of eignvalues of finite multiplicity Wi""S...S 
Lunn) » where r(n) denotes the dimensionality of the subspace V. In the orthogonal complement 
to V, Eq. (2.6) transforms into the equation u=A(A°,,, )~*u, so that its spectrum in this 
complement consists of the unique eigenvalue Ant’? whose corresponding eigenelement is the 
entire subspace H.4,9V. The eigenvalues e:exhaust the spectrum of the 


problem (2.5). 


We arrange the eigenvalues of the problem (2.5), not exceeding No +l , in increasing order 
while allowing for their multiplicities: A,""<A2""< ... . Clearly, starting from some number 
i, not exceeding r(n) + 1, this sequence becomes “stationary”: A;""=A),,,, j2t. In the case 
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when Aj 4 1° is the least eigenvalue of problem (2.5), the entire sequence A;"”, j=1, 2,..., is 
stationary: Aj" =Amety es: VY a 


On now applying to problem (1.2), (2.2) the familiar comparison theorems, and recalling 
(2.3), we obtain the estimates of interest: 


Mrm@<Ai"<-i,  i=4,2,.... (2.8) 


Since A%,,, > as m->o, and the right-hand sides of the inequalities (2.8) are 
independent of m, the number X,,, 4,9 cannot be the least eigenvalue of the problem (2.2) for 
sufficiently large m,; for such values of m, there must necessarily be eigenvalues of finite multiplicity 
of problem (2.2) to the left of Micet?t the number of which increases without limit as m increases. 

In short, we have established that the inverse operator truncation method reduces the 
determination of lower bounds for the eigenvalues {A;} to the solution of problem (2.6) in 
finite-dimensional space V. In turn, this problem is equivalent to the matrix problem 


Az=,ABzr 


with symmetric positive matrices A and B of order r(n). 


The matrices A, B depend on the choice of basis in V. We can assume without loss of 
generality that W1°,...,Wm°, S’W1,..-, S'W+(n)-m form a basis in V. With this basis, A and B 
have the block forms 


E+ C’D °C! K*+ C’D"F | 
| (2.10) 








| 
| KA, | KL, K* a Ar M 


E=(6i;), i, 7=41,2,..., m,  Ao=((Ai°)7'6:), i, j 

C= ((S’wi, u;°) a), i=1,2,...,d(m), j=1,2,...,m, 

K= ((S’w,, w;°) a); i=1,2,...,r(m)—m, j=1,2,...,m; 
F=((S’w;, S’w;) 4), i=1, 2,...,@(m), j=1, 2,...,7r(”)—m, 


M=((S’wi, S’w;)4.), i, j=1, 2,...,r(n)—m, 
L.=([ (Ax) Ane S 41g.) | i, j=1,2,...,m. 


Expressions (2.7), (2.9)—(2.12) represent the computational formulae for the method of inverse 
operator truncations. 
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3. Convergence and convergence rate estimate 
We shall start our study of the convergence of the approximate eigenvalues Ai"",i=1, 2,... 
by seeing how they depend on the index m. From the definition of the truncation (Aj ~!)) it 
is clear that it decreases monotonically with respect to m: (Ay~*)°>(Ao')°"*P >Ay*, m= 
1, 2,.... In view of this, for fixed n every eigenvalue \;"” is monotonically increasing with 
respect to m: 


converges to \;”. For this, we reduce each 
of problems (1.2) and (2.2) to an eigenvalue problem for a symmetric bounded operator in H4 0- 
We have already taken a step towards this reduction in the case of the problem (2.2), by replacing 
the identity (2.2) by the equation (2.5). The same step, for the problem (1.2), leads to the 
following equation in Hy 9: 


(I+S’0,8) u=hAo-u. (3.1) 


We introduce the notation F,,=/+S’O,S. The properties of the operators F,, were examined in 
detail in [3] and they will be used below without reference. If we make the replacement v=F": u, 
in (2.5) and (3.1), it becomes obvious that (A;")~!, i=1, 2,...,are the eigenvalues of the 
asymmetric completely continuous operator F,"A,F," in Hy, o> While (A,"”)-', i=1, 2, 

are the eigenvalues of the symmetric bounded operator F,, " (A,~*) (m) FP" in the same 


aie J 


space. 


By a well-known theorem * (see e.g., [1], p. 258), 
(he™)~*— (Mr) (Aa!) —Ac!) Fa Lae 


It is easily shown that 


| (Asm*)"— Al" a= (Amat) 


On further recalling that | F.~”| 4,<1, we obtain from (3.2) and (3.3): 


(A;"”) —i_ (A;") aif (Armn-s4) ta 


or alternatively, 
0 
Nita,” Finis (Arms) ye 
which shows that Aj” are convergent to A; as m > ©. 


On coarsening the inequality (3.4), we obtain an estimate for the rate of convergence of 
ar to ry": 


NASA? (Anas) ae 





*The theorem was proved in [1] for completely continuous operators, but it remains true for the present operators, 
only one of which is completely continuous. 
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in which, as distinct from the case of Bazley and Fox’s estimate (1.7), the constant on the 
right-hand side is independent of n. 


We can also obtain from (3.4) effective, practically computable estimates for the error 
introduced by the operation of truncation of the operator Ag~!. In fact, an upper bound is easily 
obtained for the eigenvalue A, by Ritz’s method; this bound usually holds before application of 
the Bazley—Fox method (the latter is in fact used to estimate the error of the Ritz method). 
Denoting by ; an upper bound for A;, computed by the Ritz method, we find from (3.4) that 


0 


AA Ar nsad (3.6) 
5 n 5 nm 5 2 0 - 
Mirai SA, (Anas) > (3.7) 
The estimate (3.6) is a posteriori while the estimate (3.7) is a priori. 


We now turn to a study of the behaviour of the approximate eigenvalue A;””” as a function 
of the two indices n and m. To be more precise, we shall henceforth regard \,””" as a double 
sequence and examine its convergence regardless of hown and m tend to ©. We shall give the 
same interpretation to the convergence of other objects (elements, operators) encountered below, 


dependent on the pairn, m. 


The convergence of \,’”” to A, is proved in an elementary way on the basis of the inequalities 
(1.3) and (3.5): 


4 + nm 27 9 —i_} 2 
hhh (Aes) HC, >. | (1,—O,) Sty jl. (3.8) 


j=0 
At the same time, (3.8) gives an estimate for the rate of convergence of A,” to ;. This is not a 
limiting estimate, however. A better estimate, of higher order with respect to m, can be obtained 
by comparing problem (2.2) directly with the initial problem 


(wu, v) a,t+ (Su, Sv),=A(u, v), Ouse, Vvela, (3.9) 


and estimating the error A,—A;"” in accordance with the same scheme as was used in [3] for 
estimating the error of the Bazley—Fox method without truncation. 


Let us briefly run over the scheme. We first have to reduce the initial problem (3.9) to the 
equivalent operator problem in the space H4,: (J+S’S)u=AA,~‘u, then the latter, and the 
approximate equation (2.5), have to be transformed respectively to 


u=,F-'A,-*u, (3.10) 


u=)F ,-*(A,—*) “”'u, (3.11) 
where /=/+S’S. It is easily seen that /-'A,~'=A~". For brevity, we introduce the simpler 
notation T,,,,, for the operator F,,-*(A,~*)“””. 


When estimating the error A;—A,"” an important role is played by the difference 
Rim=T nm—A™, (3.12) 
and in particular, by its two obvious representations 
Ram=F-'S’ (1,—On) ST nm t F-'[ (Ao) °—Ao7'], (3.13) 
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Ram =F 17'S! (I,—O,) SAA+F,,'[ (Ao7t) 


and the property 


|Ram| a0 as n,m—+o, 


which is proved in the same way as in [3]. 


The operator R,,,, characterizes the proximity of the exact problem (3.10) to the 
approximate problem (3.11); naturally, the error Ai—A also depends on it. The nature of the 
latter dependence is proved in the same way as in [3] . In fact, we initially obtain, with the aid of 
(3.10) and (3.11): 


(1—V;A-*) a2" = (A) (AA) Ut +TARamai”, (3.16) 


where u;””” is the eigenelement of the problem (3.1) corresponding to the eigenvalue ),’””. Then, 
multiplying (3.16) scalarly in H, by the element P,w;"", we find 


O= (Ai) -1(A2"—A,) (wi, Pia”) ata (Rami?™, Pya;"”) a, (3.17) 


where P; is the orthogonal projector in H, onto the subspace stretched over W., Ws41,.-- > Wstn 
Finally, noting that (u;"", Pyw;"") 4=|Piui"”"| 4°, and for brevity, putting y=|Piu;"”| 4~*u;"”, 
we obtain from (3.17) the required expression 


Mihir" =A Ai” (Ramy, Py) a. (3.18) 


Notice that, when obtaining (3.18), we have tacitly assumed that | P,w;"”"|4>0. We justify 
this assumption below, for sufficiently large n and m. 


The next “‘block” in the scheme of arguments in [3] is to isolate the “unimportant” terms 
in the scalar product (Rimy, Piy) a. Replacing R,,,, in accordance with (3.13) and using relation 
(1.1), we can write (Ramy, Piy), as the sum 

(Ram, Piy) ad (O" Sl aml iW, O™ SP.y) vt | ( (A Ma) a 
—A,-') Py | at (O' OT “a Py, O™ SP.y) it ( ( (A,-*) (m) 


—A,')Py, Piy) hei 
where 0 =],—O,, P® =I-P,. 


(3.19) 


We obtain the expression for 7,,,,, from relations (3.12) and (3.14): 
Pam =AU+F, 1S’OMSAt+F, | ( (AoW) 0" —Ao™'), 


and we substitute this expression into the first term on the right-hand side of Eq. (3.19). After 
obvious transformations, we obtain 


(Ramy, Piy) a=" |O™ SP yy | 2+] ((Aom*) °™—Ao-') “Pay | a,” 


+457! [F,“S’O™SP yy 42+ (SF,.~ ( "> (™)_ A,-*) Py, 0! ")SP,y) ; (3.20) 
+(O ST rmPy, O SPxy) + (((Aom!) As) Py, Pay) ace 


This is in fact the required expansion of the scalar product (R,,,.y, Piy) a into essential and 
inessential terms. Let us show that the last three terms in (3.20) are inessential. We shall show, in 
fact, that each of them has higher order than 0,,,.= |O\SP.y| .°+| ((Ao~')°"—Ao*) “Pay | 0” 
For the first term, the proof is easy: 
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| is¥,.* ( (45) (™)_Ay-') Pry, O™SP.y),| 


<IS ou] (Ao!) Ag asl ((Ao7*) ™ —Aomt) “Pry al OO SPay | (3.21) 


<Slo,1 (Ao-*) ™—Ao*1 a, Onm, 


where |*|9 ; denotes the norm of the bounded operator acting from H49 into H). For the other 
two terms, the required bound cannot be obtained directly, since they contain the expression 
Py, the connection of which with the quantities |Q°’SPiy|, and | ((Ao~*)°"—A,~') *Piy | 4, 
is as yet unknown. The connection is established by: 


Lemma 


The pair n(i), m(i) exists such that, forn > n(i),m = m(i), we have 


am | 
| Pia,’ ” | ae, 


|Py| a<Cs( |O™SPy|.+| ((A,7*) (™ — A.-*) "Py ic). 


where the constant Cs; is independent of n, m, and i. 


The proof is the same as the proof of Lemma 8 in [3]. 


We can now easily obtain the required estimates: 


| (((Ao-#) (Ay!) Py, Pi, y) aol <2Cs1 (Ao!) —Ao! | asOnm (3.23) 


| (OM ST ymPy, OMSPy) ,|<2Cs|OST am | o1Onm- (3.24) 


Notice that the convergence to zero of the quantity |OST,,,,|,, follows from (3.12), 
(3.15), and Lemma 1 of [3]. In short, when estimating the rate of convergence of \/"”" to A; we 
can neglect the last three terms in (3.20). The third term on the right of (3.20) also has no 
influence on the order of smallness of the error A;—A;”"” since 


0<IF, “S’O SPy|a2<|S|o21O™MSPiyl 2. (3.25) 


A more exact result, which follows from (3.18), (3.20), (3.21), and (3.23)—(3.25), can be 
stated as follows. 


Theorem | 


If the sequence of subspaces {W,,} is asymptotically dense in H,, then, given any 
i=1,2,...,apair n’(i)= n(i), m’(i)=>m/(i), can be found such that, forn=n’ (i), m>m’(i) 
we have the two-sided estimate 

0.5A; | O™ SPiy | °+0.5A," | ( (A,~*) (m™)__ 4 .—*) “Py | Peg (3 26) 
SAA mC | O™SPiy | 2+2A;7 | ( (A,~') i A) Py | ryt 
where C-=1+ |S | o,”. 
If we use the same method as in [3] , and expand P;y in eigenelements w,,..., Ws, 


corresponding to the eigenvalue A;, we easily obtain from (3.26) an estimate connecting the error 
Ai—A,;""" with the errors of approximation of the elements Sus, ..., SUs+». 
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Theorem 2 


Let the conditions of Theorem 1 hold. Then, for mm’ (i), n>n’ (i) 


x 


hihi" <C; (i) De (lO Stash tl (As) ™ Ao) "teajlat), (3-27) 


j=0 
where the constant C(A;) depends only on A,. In the case of a single eigenvalue(x =0)the order of 
the estimate (3.27) cannot be improved. 


It is only possible to estimate the order of smallness of the best approximation |O‘ Su,| ; 
if we take concrete operators A, S and concrete subspaces W,,. With regard to the quantity | ( (A,~*) 
(™)—Ay~*)“u;| 4, an estimation is possible without auxiliary assumptions. We introduce the 
orthogonal projector Q,,, in H4q onto the subspace stretched over W;",..., w»°, We know that Q,,, 
is also the orthogonal projector in H. Using the definition of the truncation (A,~')‘” and the 
spectral representation in H4q of the operator Ag— 1 we can easily show that 


a Ae yea) As") *(T—-Oa). 
In the light of this equation and the estimate (3.3), we have 


| ( (Ay—') °—A,-') “ala 7 ee ek (I—Qm) W5\ Age (3.28) 


If it is assumed that uj;=D(A,°°**), j=1, 2,..., for some B > 0, then instead of (3.28) 
we can obtain the better estimate 


| (Agu!) ™—Ag-') "u451 45S (Aas) #1 (E—Qm) Ao Uyl. (3.29) 


For, under the condition indicated, 
| (I—Qm) U5] a= 1A 0-* (E—Qm) Ao 431<|Ao~* (E—Qn) | 


0.548 


X|(E—Qm) Ao ujl= (Amat)? | (E—Qm) Ag Ua, 


which, in conjunction with (3.28), gives (3.29). We have thus proved: 


Corollary 


If the conditions of Theorem 1 hold, n>n’(i), m>m/’(i) and u;=D(A,°°**), j= 
for some B > 0, then 


Ai—Ai"" <C; (Ai) Z. 1O™ Saas] 2-+o( am") ~*-™). 


Note. In the next section we shall consider an example in which the test subspaces form a 
generalized and not an ordinary sequence {W,,}. The extension of the earlier results to this case is 
trivial: we simply have to replace the index n in all the expressions by the symbola. 
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4. Application to the Neumann problem 


The main practical difficulties that arise when using the new truncation operation are 
connected with the operator S’. We previously had to deal with this operator in [3, 5] , though 
there it played an auxiliary role. Now, the operator S’ occurs in the computational expressions 


(2.12). 


It is only rarely that S’ can be found in explicit, closed form. It is fortunately not essential 
to be able to do this. The practical problem concerned with S’ may be stated alternatively as: 
to select the test subspaces W, (or Wa) in such a way that the operator S’ can be simply evaluated 
on the elements of W,, (Wa). While this last problem is again difficult, the important example given 
below shows that a solution is possible. 


Let Q be a circle or rectangle, and dQ the boundary of Q, Q=QU<AQ. Given in 2 the 
functions.a(x,, Z2), @ij;(%1, L2), i, j=1, 2, satisfying the conditions: 


t=const>1, a(x, 22) >1. 


In the space H = L({2) we define the self-adjoint pdo A by the relations 
D(A) ={ulueW.? (Q), du/ON=P Vuv=0 on AQ}, 


Au=—div PVutau, 
where P=(a;;), i, j=1, 2; v is the unit vector (written in the column form, as are all vectors) 


of the inward normal to 02. Hence A is the operator of the Neumann problem for a second-order 
two-dimensional elliptic equation; and its properties are well known. 
We specify the operator Ag by the expressions 
D(A,) = {u| wEW,’ (Q) , du/dv=0 on 0Q}, A,u=—Autu. 
We have 
(u,vla= ff (A Vu Vv+auv) dQ, 
Ae! 


H,,=W,'(Q), (u,v). = ff (Vu Vut+uv) dQ, 
Q 
so that 


(u,v) a= (u, v) a, + ffe (P—&) Vu Vv+(a—1) wv] dQ, (4.2) 


where = (6,;), i, j=1, 2. If we put Hi=L2(Q) XL: (Q) XL2(Q), D(S) =W2'(Q), Su=# 


(Ux,, Ux,, U)", where T denotes transposition, ux,=0u/02:, 


om MS 


ge 
(a— 1) "fp r 
then Eq. (4.2) can be written in the form (1.1) and all the conditions for application of the 
Bazley—Fox method to the problem Au = dw are satisfied. 
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In order to find the test subspaces in which we are able to compute the values of the operator 
S’, we have to analyze in more detail the structure of the space H vi 


Theorem 3 


The space H,=L,(Q) XZ2(Q) XZ,(Q) can be written as the orthogonal sum of the three 
subspaces 
H,=X®YZ, 
where 
X={w|w=(Ux,, Ux, w)™, wEW,'(Q)}, 
Y={w|w=(ux, —Ux,, 0)", weW,'(Q)}, 


Z={w|w=(Uz,, Um, Au)’, w=EW,?(Q), du/dv=0 on AQ}. 


Proof. It is easily shown that X, Y are subspaces. Their orthogonality follows from the 
well-known theorem on the decomposition of the space Z,(&2) XZ,(Q) into an orthogonal sum 
of subspaces G= {g| g= (Ux, Ux)’, WEW2'(Q)} and J={g|g=(us, —uU;x,)’, weW,'(Q)}. 
For, given any pair p=X, pe=Y we have 


P= (Uz, Ux, U)*, WEW,'(Q), ~=(Va, —Vx,,0), vEW,*(Q), 


(9, Pi = ff (Ux,Vx—UxUx,) dQ, 
G 


and since (Ux,, Ux,)"=G, (Vs,, —Ux,)"=J, we have (Q, p).=0. 


It remains to show that (X®Y)+=Z. We take an arbitrary element w=(w,(a1, Z2), We 
(1, 2), Ws (24, 2) )"=(XPY)+, We have 


ff (z,W,+U,,W2+uw;)dQ2=0  WueW,'(Q), (4.3) 
Q 


ff (V,,W,—V;z,W2) dQ=0 Vvew.! (Q). 
) (4.4) 


By the theorem just mentioned, on decomposition of the space L, (2) XZ, (Q) ,it follows from 

(4.4) that w,=Z.,, W2=Zm, where z=W,'(). On using this fact in the identity (4.3), we get 
{f (Uz,22, FUx,22, TUWs) dQ=0 Vuew,' (Q) ° 

This last identity implies that z(x1, X2) is the generalized solution of the Neumann problem 


Az=w; in Q, 0z/dv=0 on AQ. (4.5) 


Since Q is a circle or rectangle, Eqs. (4.5) will hold almost everywhere, in Q and on 02 respectively 
(see [8] ). The theorem is proved. 


The set {u|w=W,"(Q), du/Av=0 on dQ} is a subspace of the space W>2(Q). For 
brevity we shall henceforth denote it by ®. 


Let us turn to the construction of the test subspaces. They will now depend on the three 
integer indices 1, p, g. The set 1 of all triples «= (/, p, q) will be assumed to be partially ordered 
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in the natural way: aX<a’, if IX’, p<p’, qXq’. Since U isa directed set, the test subspaces 
Wa will form a generalized sequence. Roughly speaking, the subspace Wa will be constructed 
according to the same principle as that on which the space H, was constructed in Theorem 3. We 
choose three sequences of finite-dimensional subspaces {Zi}, {Mi}, {Ui}, lying respectively in 
W.'(Q), W.'(Q)and ®. After fixing an arbitrary triple a=(/, p,q), we define Wa as the 
collection of all elements w of the type w=A~'(~+ptt), where p=(vx,, Vs, Vv)", VEL, 
p= (fx, —fx, 0)*, fEMy, C=(2x,, Ze, AZ)7, 2EUq. 


Let us find S’'w for w=W,. We have 


(Su, w),=(A-'Su, ptyptt), = {f (Vu Vut+uv) dQ 


+ J (Ux,fc;—Ux.fx,) dQ + I) (Vu Vz+u Az) dQ VueD(S), 


and since the last two integrals vanish in accordance with Theorem 3, we have (Su, w),=(U, V) «.. 
But this implies that S’w = v. In short, the difficulty arising in applying the inverse operator 
truncation method can be overcome in practice in the present example. 


In order for the approximate to converge to the exact eigenvalues, the generalized sequence 
{W..} has to be asymptotically dense in H, . Sufficient conditions for this are given by: 


Theorem 4 


If the sequences {LZ}, {Mi}, {Ui} are asymptotically dense in W,'(Q), W.'(Q) and &, 
respectively, then the generalized sequence {W,} is asymptotically dense in H,. 


Proof. Let II and ®, be the orthogonal projectors in W,1(Q) onto L; and M, respectively, 
and let I’, be the rtieoaaaael projector in W>2(Q) onto U,. We take an arbitrary anime weH,, 
multiply it on the left by the matrix &, and on the basis of Theorem 3, decomposeRw into a 
sum of three mutually orthogonal terms: 


Rw=(ARw) x+ (Rw) y+ (Kw) z, (4.6) 


where (Aw) x=(Vey Vx, V)", VEW,'(Q2), (Hw) v=(fa, —fe, 0)", feWs2'(Q), (Rw) = 
(Zx:; Zn, Az)", 2=%. Using the decomposition (4.6), the orthogonality of the elements (Aw) x, 
(Rw) y, (Rw), and the minimal property of the orthogonal projector Oa, we find that 
|w—O.wl?<|R-*|,? (]v—Tv ll way tll f—O fll ware 
, (4.7) 
+11 2—Pe) lca) FHV (2—T) IIhnacay)- 
Since ||Aull;, yt || Val =, lull ws (2), the theorem now follows from (4.7) and the 
hypotheses. 


Given a concrete choice of the subspaces {Z:}, {Mi}, {U;} we can estimate the rate of 
convergence of the approximate to the exact eigenvalues. We shall confine ourselves to the case 
when Q is a circle. It can be assumed without loss of generality that the center of the circle is the 
origin. As L; we take the set of polynomials of degree not higher than i with respect to each of the 
variables x1, x, while as U; we take the linear hull of the first i eigenfunctions of the operator Ag, 
and we define M; by the expression 


M;={f|f=(x2+2.—p’)v, veL}, 
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where p is the radius of the circle 2. 


To estimate the rate of convergence, we use the corollary to Theorem 2, after first 
replacing the index n in it by the multi-index a = (J, p, q) (see note on the corollary), and replacing 
the eigenvalues An? by their asymptotic form in m. The inequality (3.30) then takes the form 


hi—M2"™<C; (Ai) Be | (1: —Oa) Sti] :2-+0 (m--*), (4.8) 
j=0 
The inequality (4.8) holds for the values of 6 for which u;=D(A,°***), j=1,2,... . In the case 
of a circle, it follows from the assumptions (4.1) that u;©W,.”** (Q)(see e.g., [9] ). Noting this, and 
the results of [10] , we can assert that 8=0.25—0.5e,where € > 0 and is arbitrarily small, so that 
the second term on the right of (4.8) has order o(m~****). 


Let us find estimates for the quantities | (7;—O.) Su;|,’, j=1,2,... . We write the 
element #Su; as the sum of its projections onto X, Y, Z: 


RSu;=(ASu;) x+ (RSuj) y+ (ASuj) z. (4.9) 


Let v;, fi, 2; be elementsof W.'(Q), W.'(Q) and ®, respectively, realizing the projections 
in question, i.e.,(ASw;) x= (View Vier Vi)", (RSU;) y= (fiery —fier, 0)", (MSuj) 2= (Zixis Zia» AZi)*. 
Proceeding in the same way as when obtaining (4.7), we obtain 

| (1,0) Sujl.2<2|R-*1,? ({lv—Tryll waco) 


(4.10) 
+Ilf—® fill watca) + ll2s—Tezill way) 
Estimation of the quantities ||v;—ILv;llws'ia), |lf;—Opfillws'tay and ||z;—Iy2;|| wie) is a familiar 
problem in approximation theory and has in fact been solved. The answer given by this theory 
depends on the differential and integral properties of the functions »;, f;, and z;. Using the 
decomposition (4.9), the conditions (4.1) and the relation u;©W2** (Q), it is easily shown that 
v;, fj, 23=W?** (Q). We then obtain directly from the results of iB 1, 12] the estimates 


|v; — wll wa) =O (I), (4.11) 
fi— vf sll wst(a2y9=O(p*“’). (4.12) 


We shall initially estimate the quantity ||z;—I',z:|| w.::a) by using the minimal property of the 
orthogonal projector 1, and the equivalence of the norms ||w|| w.(a) and || A ou||,,;a) (see [8] ): 


q 


z;—T-,2;|| w.2(2) <= l= Bia ba (2;, v;>) v;° | 


W27(Q) 
== { 


q 


<C,[[ 402) - (Ao2;, vi’) vic | 


_ 117 4(Q) 

then we apply Theorem 2 of [13] to the term closing the chain of inequalities (4.13). The result 

thus obtained will depend on the value of the parameter o, for which the relation A z;]=D(Ao’). 
holds. We proved above that 2;=W 2** (Q). In accordance with [10], this ensures that A,z;=D(A,°") 
for k = 1, and Aoz;=D(A}--°* for k>1. For these cases, Theorem 2 of [13] gives respectively 


(4.14) 
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q 


L2(2) 


edits . 


i= 4 


Combining the estimates (4.8), (4.10)—(4.15), we finally get 


ee all} +-Olp  *) Folge ot), (4.16) 


where 06(1)=0, o(k)=0.5 for k>1. 


It should be mentioned that the estimate (4.16) is better than the estimate obtained for the 
same problem in [5] , where the Bazley—Fox method is used with a special choice of test spaces. 


Translated by D. E. Brown 
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STATIONARY STRATEGIES IN DIFFERENTIAL GAMES* 
O. A. MALAFEEV 
Leningrad 
(Received 10 January 1975; revised 8 December 1975) 


DIFFERENTIAL games with dependent movements are considered. The class of mixed stationary 
strategies is shown to include e-equilibrium situations. 


The dynamic behaviour of the games considered below is specified by the differential system 


&£=f(z, u,v), 


which satisfies the following conditions: 


1) c=R"=X, where R™ is m-dimensional Euclidean space, t=[0, 0), w=Uc 


CR‘; U, V are compact sets; 
2) f is continuous with respect to (x, u, v)in R"XUXV; 
3) f satisfies a Lipschitz condition with respect to x; 
4) positive numbers M and M’ exist, such that, for any r=R", 
f(z, u,v) ||<M+M"|z\l, 
where ||x ll is the norm of x; 


5) the sets #’ (x) = {y|y=f(z, u, v), weU, veV} are convex and closed for all c=R". 


Definition 1. An admissible control in the set U (or V) is a measurable function w : [to, f; | 
+R? or v: [to t,]>R*) suchthat u(t)=U or v(t)=V) forany t&[to, ty]. 


Definition 2. A trajectory of the differential system (1) in [fo, ¢,] is an absolutely continuous 
function z : [to, t, ]—R™, for which admissible controls u, v exist, such that ¢(t) = f(x(Z), z(t), 
uv(t)) almost everywhere in [fo, t:]; (2(to), to) is called the start, and (x(¢,), ¢,) the end, of 


the trajectory x(f). 


Definition 3. The set F'(2o, fo, t:) of vectors z=R”, for which a trajectory x(t) exists in [ fo, 
t,] ¢ with the start (x, tg) and the end (x, f;), is called the set of attainability of the system (1) 


from the point (x9, fo) in the time ¢; — fo. 


Condition 5) is not essential, and is added merely to simplify the treatment, while 
condition 4) can be replaced by the requirement that the solutions of system (1) be continuable into 
the interval in which the game is considered; this is well known to involve no loss of generality. 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 42—51, 1977. 





O. A. Malafeev 


Let us mention two propositions concerning F(Z, to, t), required later. The proofs can be 
found in [1]. 


Proposition 1. For any x,=R”, t:<t] [0, ©) the set (Zo, to, t) is a non-empty compact 
subset of R™, and regarded as a function of (20, to, to) =Zo. is continuous in aggregate in the 
Haussdorf metric; and (Zo, to, ¢) . 


Proposition 2. The mapping 1[2o, to, t;], which associates the pair of controls u(t), v(t) in the 
interval [tg, t;] and at the point xg, with a trajectory of system (1), 1 [ 20, to, 44] : UXV—>F 
(Zo, to, t,) ,is continuous. 


Here, F' (20, to, t:) is the space of trajectories of system (1) starting at the point x, furnished 
with a uniform metric in the interval [fp, t)]- 


Let (w;, v;) * (U2, V2) denote the admissible control of system (1) in the interval [to, t2) = 
(to, t;) U[t,, t2) ,the contraction of which into [tp, t,) is the same as (u,, 1), and whose 
contraction into [t, , t2) is the same as (uy, V2). 


1. The game I'(2, 7, U, V) starts from the point z»=X at the instant tg = 0 and ends at 
the instant T <°%. Let F be the maximizing, and P the minimizing player, of the game; at any 
instant t<[0, 7] they both know k and the state x(t) of the game at t. Let X, be the set of 
finite divisions o= {t)=0<t,°< ...<ty° =T} of the interval [0, 7]. 


The strategy P (gr W) of the player P(or £) in the game ['(z,, 7, U, V) is the pair (€, $) 
or(y, )), where £& nEXr, P={Po}sexr or P={po}cesr). Here @z (or Wg) is the strategy 
of P (or £) in the discrete game [°° (x, 7, U, V) ,determined for the division t:=o i.e., the 
mapping associating the information state of P (or £) at the instant o=27, with the probability 


measure i= (t;, x(¢;) or vi=v(ti, x(t:))) in U, V at the position x(¢;). 


The terminal pay-off in the game I(x, 7’, U, V) is specified by the function H, which 
satisfies a Lipschitz condition in X.In the game I(a., 7, U, V) the pay-off in the situation 
(~, p)=((E, ®), (, })) is the mathematical expectation of the pay-off in the discrete game 
I'°(2., 7, U, V) in the situation (q@., p.), where o=EUn. We denote it by H(@, wp). 


Theorem 1 


Given any € > 0 there exists an €-equilibrium situation in the game I(a, 7, U,V) for any 
XEX, T<o0, 


Proof. It follows from [2, 3] that, for every sequence {6,,} n-1- of divisions of the interval 
[0, 7] such that 


lonl= max (t,—t;-,) +0, 
i<i<N, 


there exists 
lim Val ('™(2z,, 7, U, V)) =V (ao, T), 


n—> oo 


which is common for all such sequences. 


We specify e > 0 and choose a > 0 such that, for every o@2Z;r such that | o |<, we have 
|V(-)—Val (T°(-)) |<e.We put 
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g'= (a, {Qc} cexr); p= (62, {ipo} oc $s) 


Here |o;|<a, i=4, 2; Qo (or tp.) is the optimal strategy of P (or E) in the game T° (zp, a 
U,V). We put 6=o,Uo.. Then, in view of the choice of a, &, we have | V(-)—Val( [7(. )) |<e, 
while in view of the choice of Qs, Ws we have H° (gs, ps) <H" (qs, ps) <H*( Qs, Ws) for any 
strategies @s, Ys of players Pand F in the game [°° (x, 7, U,V). 


Consequently, for the strategies y, y of players P and F in the game I(x, 7, U, V) we 
have H(q*, p)—e<H(g", p°)<H(Qq, ’) +e. 


Along with the game I'°(2,, 7, U, V) we shall consider the approximating game [T° (xp, 7, 
U., Ve), where Ug, V¢ are finite subsets of the sets U and V respectively. 


Let us now recall the definition of a recursive game I’. It is a finite collection of n antagonistic 
component games I,,i.e., T={I,,...,1°.}, each situation (p", ")=O'X WP", k=4, 2 
being associated with the generalized pay-off 


H* (q*, ip") =p*e 


The generalized pay-off means that the player FE obtains from player P the quantity e* with 
probability p*, and the recursive game moves over, with probability q*/, to the new state, i.e., the 


component I’. By the strategy y of the player P in the game I’ we mean the sequence p= {@:} 72, ; 
so that, if P is in the 
component game I, at the instant ¢, then he employs the strategy pk. The strategy w of the player 
F is defined in a similar way. The value of the pay-off function at a point of I’, in the situation 

(y, ) is taken to be equal to the mathematical expectation of the pay-off during a random walk in 
the situation (y, y) from the initial position T,. The pay-off in the situation (y, y) is thus defined 


where G:= (@; : i=1,2,..., 2, 


as 
H (9, ») =(H'(@, ) 


We say that the component game I; satisfies the minimax condition if the game obtained by 
replacement of the generalized pay-off H” by the pay-off 


H. (q*, vy", W) = p*e" + je qW;, 
where WER", has a minimax solution for any WER". 


Let 1 =(1,..., 1). The recursive game I will be said to have a solution if a vector VER", 
exists, such that, for every € > 0, a strategy —-=®, p.<W, exists, such that, for any P=O or 
pe , we have 


H(q, pe) —€-1<V<H (q, p) +e-1. 


The vector V is called the value of the game I’, while Y¢, W¢ are called the €-optimal strategies 
of players P and F respectively. 
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The strategy y (or W) is said to be stationary in the component i if ~,'=q@,' or p'=,') for 
any t. The strategy y (or W/) is stationary, if it is stationary in all the components. 


Theorem 2 


Every recursive game I’, whose game components have bounded pay-offs and satisfy the 
minimax condition, possesses a value; €-optimal stationary strategies exist for the players P and E. 


For the proof, see [4]. 


Lemma | 
Given any 2X, [7<©, gd, and any finite sets U;cU, Vic V the gameI° (2), 7’, Us, Va) 
is recursive, and there exist in it €-equilibrium situations in stationary strategies. 
Proof. With every point x = xo, 
ren[zx(t), ti, t; ae (Us, V3) (ti+1) ; i=0, ‘. eeey N,—1, 


we associate a game component I’, as follows. The spaces of player’s strategies in the game are 
Us, Vs. With each situation (u,v) =UsXV> is associated the generalized pay-off 


H (z)-4, if reEF (z (tn,-1)s ENg-1) T), 
H,.(u,v)=}jTy-1 otherwise , 
ti y = W(x, tisr, bisa](U, V) (tise): 


It is easily seen that relations (2) are satisfied here, and hence the game [°° (xo, 7’, Us, Vs) 
is recursive. In view of the finiteness, every game component I’, satisfies the minimax condition, 
and by Theorem 2, the game I’ (2, 7’, U;, Vs) has a value, while the players P and FE have 
€-optimal stationary strategies; and in view of the finiteness of 0, we have e = 0 here. 


Lemma 2 


The value function Val (I’(2, 7, Us, Vs))=V°(-) is continuous with respect to 
2EX, T<0o, 


The proof is similar to that of Lemma 2 in [5]. 


Let P,r denote a uniform metric in the space of functions, continuous in the set /'( Zo, 
to, J’) and let ['°(2,,7,U,V,H) denote the game in which the terminal pay-off is specified by 
the function H, continuous in X. 


Lemma 3 
For any € > 0, there exists § > 0 such that, if o,,7(H, H’) 6, then 


|V°(-, H)—V°(-, H’)|<e. 
Proof. Consider the functional equations of the game I’ (2, 7,U, V, H): 


V° (ao, 1, -)= [ { V°(aL 20, tot] (, 0) (t,), T—ta, -) duo" ave", 


UV 
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Here, i, Vi are the optimal probability measures, on which the values are reached in the 
relevant equations. For the proof, we use induction on 7, i.e., on the number of interior points in 
the division o. For n = 0 the lemma follows directly from the properties of the integral. Now 
assume that the lemma holds for V,=n-+1, and let us show that it holds for the case when o 
contains n + 1 interior points. In the set F(x, to, 4:) we consider the function 


V" (x4, T-t,, -), o .<ty, =T}. 


By the inductive hypothesis, for every € > 0 and any 2,=F (24, to, t,) there exists 6(¢, 2), 
that, if p.7(H, H’)<8(e, x,), then 


LV (x4, T—t,, ne H)—V"™(x,, T—t,, na H’) |<e. 


We consider the number 
5(¢) = inf 6(e, x,) 


x 


co 


and we aim to show that 6 (€) > 0. In fact, if 5 (€) = 0, then there exists a sequence {2,"} &, 
of points of the compact set F'(2o, fo, f;), such that 6(2,') +0, i>, while 


lim 2,'=2,°EF (xo, te, é:). 


For the point x, © the number 5(e, z,°)>0 will not exist; but this contradicts the 
inductive hypothesis. We thus find that, given any € > 0, there exists 5 > 0 such that, if p,,. >(H, 
H’) <6, then 


On, u(V (21, Tt, -, H), V" (a1, Tt, -, H’))<e. 


From the functional equations (3) and the assertion of the lemma in the case Ng =n + 1 we obtain 
the lemma in the case N,=n-+2. For, let us specify € > 0. Then, 5 > 0 exists, such that, if 


On, .(V%"(2,, T—ty, -, H), V(a,, T—t, -, H’)) <6, 
|V° (x0, T, -, H)—V° (xo, T, -, H’)|<e. 
And for the given 5 > 0, there exists n > 0 such that, if .,7(H, H’) <n, then 


Pn, (V4, Pty, H), V°(2,, T—t,, -, H’)) <6. 


Instead of the game I' (zo, 7, U, V) it will be convenient below for us to deal with the 
similarity defined game T (ao, T, U, V). Let K(A) be the family of all finite subsets of the set 


A. Player P’s strategy 


o=(E, U', {po(U, V") Joers, urvexww), v'eKv)) 


in this game is specified by the division E]=2Z,, the set U’=K(U) and the set of P’s strategies 
in all the possible games '° (2, 7, U”, V”).The player £’s strategy is defined in a similar way: 


p=(n, V’, {po(U”, V") oer, vexiu), v''ex(v))- 
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The pay-off in the situation (y, W) is defined as follows: 
H (9, p) =H? (q.(U’, V’), po(U’, V’)), o=EUn. 


Definition 4. The strategy y (or W) of player P (or £) in the game re 2. Ys 
stationary if all the strategies{p.(U”, V”)} (or {p.(U”, V”)}  ), participating in the definition 
of the strategy y (or W), are stationary in the respective discrete games. We shall denote the optimal 
player’s strategies Qa", Ya" 


Theorem 3 


In the game l' (x, T, U, V) there exist €-equilibrium situations in stationary strategies for 
any EX, T<~, e>0. 


Proof. It is easy to show (see e.g., [6] ) that, given any one-step game I'(A, B) with continuous 
pay-off function on a product of compact metric spaces A and B, there exist, for all e > 0, finite 
sets A.=K(A), B.<K(B), such that each of the values Val (I'(A., B.)), Val (P'(A-, B)), 
Val (('(A, B.)) differs from Val (I'(A, B)) by not more than e. 

Let us show, by induction on the number of points in the division o=>,, that, given any 
€ > 0, there exist finite sets U.=@K(U), V-=K(V), such that 


| V°(a, T, U, V) —V°(20, T, Ue, Ve) |<. 


For the case Ng = 1, i.e., when the division o contains no interior points, the game i Ce re Te 
is a one-step game, and the theorem follows at once from the above-mentioned results of [6]. 


Assume that the theorem holds for games ['° (2, 7,U,V) such that o contains not more 
than n interior points; we shall show that it then holds for games T°’ (a, 7, U, V). Assume that 
the (n + 1)-th point of the division is t/=(tm, fm+1). Consider in the set F' (Zo, to, t’) the function 
V(x’, T—t’, Ue, Ve), which, by Lemma 2, is continuous. Here o’= {t’/<t,,4,<...<T}. 


By the inductive hypothesis and the continuity of the functionV”' (x’, T—t’,U., V.)on the 
compact set F'(a», to, t’), given any € > 0, no matter how small, we can choose finite sets U-= 
K(U), V-=K(V), in such a way that 


Pn, (V" (x’, T—t’, Ue, Ve), V" (x’, T—t’,U, V)) <e. 
By Lemma 3, given any e’ > 0, there exists € > 0 such that, if 


Dito, (VV (2, T—t’, Us: Vs); pe (x, rf’, U, V) ) =F. 


|V°(a, T, U, V) — Val (T'.°(2o, t’, U, V)) |<e’, (4) 


where I'.°(zo, t’, U, V) is the game with pay-off function V°'(z’, T—t’, U., V.), specified on 
F (Zo, to, t’). 


It follows from the inductive hypothesis that, given any € > 0, there exist U.@€K(U), Ve 
K(V), such that 


|V°(x, 7, Ue, Ve) —Val (Te°(ao, T, U, V)) |<e. (5) 
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From relations (4) and (5), given € > 0, there exist U,.=@K (U), Ve]=K(V) such that 
| V°(xo, T, Ue, Ve) —V°(to, T, U, V) | <e. (6) 


It can be shown in a similar way that, given any € > 0, there existU,=K(U), V.=K(V), such 
that we have respectively 
|V°(2o, T, Us, V) —V° (20, T, U, V) |<e, (7) 


| V°(x0, T, U, Ve) —V" (a0, T, U, V) |<e. (8) 


Let us turn to a direct proof of the theorem. We specify the number e€ > 0. By the inequalities 


(6)—(8) and Theorem 1, there exist U-=@K(U), V-=K(V), 6;, such that 
| lim V°(ao, 7, U, V) —V" (xo, T, Ue, V) |<e, 


|o|>0 


| lim V° (xo, T, U, V) —V"(ao, T, U, V.) |<e. 


In the game I’ (zp, 7, U, V) we define the strategies y€, © of players P and E respectively as: 
PT LE HO Veasereactas: v0xir)): 
p=(o2, Ve, {po (U’, V’) Soczr, vrexiu), vrexiv))- 


Here, by Lemma 1, the strategies @."(-), wp,"(-) can be assumed to be stationary. By definition 
of the strategies y®, ©, we have, for all p=O, po’? 


H(g°, p)—e<H(Q’, y°) <H(Q, tp*) te. 


2. The discussions of Sec. 1 can be extended in a natural way to the case of time-optimal 
games with dependent movements in the space X. We shall first construct the approximating 
multi-step games; by means of the results of [4] we shall prove the existence of equilibrium 
situations; then finally, we shall prove an existence theorem for a continuous time-optimal game, 
defined in a similar way to the game (x, 7, U, V) of Sec. 1. 


We isolate in X a set M, which we call terminal, and, after fixing a point x»=X, we consider 


the set 
C= U F (xo, to, t). 


te[0,0) 
Assume that it is compact. We specify a number e > 0 and then find 
T. = min {tlo(C,, X\C)<e}. 


tE[0,00) 


F (Xo, to, at B 
"=[0,t] 


and p is the Hausdorff metric. 


We also fix the set Uc@¢K(U), V.=K(V) anda division 6,=2,7, such that the set 


D(x, €) ={n[zi, bi, t; al (U., V.) (ti+1)}, i=0, es. No, » 


forms an e-mesh of the set C7¢. For simplicity, we put here [ti—ti-,] = 6, =6 for i=41,2,... , 
No, 

We now construct the dynamic multi-step game [°° (x, Ue, V-). It proceeds as follows. At the 
instant tp = 0 the two players P and F, knowing the initial position xg and the instant tg, choose 
respectively the points uoEU,, VoHVe, 3 asa result of this, the game moves from the state xg to 
the state z,=2[Zo, to, ts] (Wo, Vo) (t1), etc. At the instant r; the game is in the state x; = x(t). 
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If o(x;,.M)<e, then the game ends and the player F obtains from P the pay-off f;. 


If yeu[x(t;), t;, tiss] (Ue, Ve) (ti+1), exists, such thati>j +1, o(z:, y)<e, then the 
state x; is replaced by the state y. 


In the other cases, at the instant t;, the players P and F, knowing the game state (1:, z(t:) ), 
choose respectively the points u;©U., vi=V. and as a result the game moves from the state x; to 


the state 2;,,=[2i, ti, tiz,]X (Ui, Vi) (bias). 


Definition 5. We define the strategy y (or ) of the player P (or E) in the game I'* (xo, Ue, Ve) 
as the mapping which associates the informational state of player P (or E) at the instant ¢; with the 
probability distribution (z;, t;) (or Vv (x;, ti) ) at the position z;=x(t;) in the set Ug (or Ve). 


The pay-off in the game I’ (x, Ue, Ve) in the situation (y, W) is the mathematical 
expectation of the time of the game. 


Definition 6. We define a stochastic univalent game I’ as the collection {[;}2_, of component 
games with player’s strategy spaces ®;, Y; and a generalized pay-off of the form 


Alga) =etpS + ¥ qul's 
j 


where i, Gij=0, Pi + >. gii=1, and e; is the non-negative pay-off at the i-th step, obtained by 


player E regardless of whether the game ends or not; p; is the probability of termination of the 
game, and qj; is the probability of the game moving from the component i to the component / 


(see [4]). 
Lemma 4 


The game I* (2), U., V-)' is a univalent stochastic game. 


Proof. We associate the point x = xo, 
ren xr(ti), ti, tis:] (Ue, Ve) (tits), i=0,1,..., 


with the component game I’, as follows. The strategy spaces in these games are U,, V,. With each 
pair (u,v)=U.XV. we associate the generalized pay-off H.(u, v) =1-S,if p(x, M)<e; H, 
(u,v) =1-T,, if there exists yen[z(t;), t;, tics] (Ue, Ve) (tj41), such that p(z, y)<e, 
i<j+1; H.(u, v)=1-T,+6 in the remaining cases, where y=[.x, fis1, tise] (u, V) (tise). 


The lemma can now be proved directly. 


Definition 7. The set [*={T,} of component games of the stochastic univalent game Tis 
said to be a “trap” if, every time that the game hits a component of I*, the player F can guarantee 
that the game stays in the components of I* for an infinite time, so that he thereby obtains an 


infinitely large pay-off. 


We know from [4] that, if there are no traps in a univalent stochastic game and every 
component game has a minimax solution, then a solution exists in the game I’. Hence we obtain 


ry 


the following proposition concerning the game T° (2, U., V.). 
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Theorem 4 


If, in the game I'*(x,, U., V.) the player P has a strategy guaranteeing him finite 
mathematical expectation of the game time, then a £-equilibrium situation will exist in the game 
for any — >0. 


Now assume that the value of the game can become infinite; this means that, for any a> 0, 
there exists 2], such that H(q, tp.) >a for any PEO. 


Under this assumption, it follows from the definition of the gameI’*(z,, U/., V.)and the 
results of [4] that: 


Theorem 5 


Given any £ > 0, an &-equilibrium situation exists in the game [°° (x), U., V-) 


We shall now define the multi-step game [°° (.7, U., Ve). Here, a is a division of [0, ©) 
containing no limit points. Let 2. denote the set of all such divisions. In this game, at every instant 
t:=o the values of t; and x(t;) are known to both players. 


The definition of the strategy yg (or Wg) of player P (or £) in the game is similar to the 
definition in the game [°° (x, U., Ve). 


Definition 8. The player P’s strategy Yo is called successful if, for any strategy Wg of player 
E, the time of the game [°° (zo, U., V.) in the situation (q@,", ip.) is finite, and moreover, 


sup t(@o", Ps) <%. 


Lemma 5 


If, in the game I'°(2, U., V.) a successful strategy Yo is available to the player P, then 
£-equilibrium situations exist in the game for all — > 0. 


Proof. By hypothesis, the discussion of the game [= T° (2, Ue, V-) can be replaced by a 
discussion of the game I’=<¢(-, -), @,°, ‘o>, where ®,”* is the set of successful strategies of 
the player P. It is then easily shown that I” is a stochastic univalent game, and it follows from [4] 


that £-equilibrium situations exist in it for any § > 0. 


We shall now define the continuous time-optimal game I (2, U, V). In this game, at any 
instant t=[0, ) the values of t and x(t), and set the set M, are known to both players. 


Definition 9. We define the strategy y of the player Pin the game I'(2, U, V) as the 


collection 


p= (&, . {p.(U”, at Pe U'’'eK(U), vi'eK(v)); 


where E]S.., U’EK(U), {o(-)}o, uv’, vr’ is the set of P’s strategies in all possible games 
P(x, U0", V"), o&X., U“EK (VU), V’EK(V). 


We define player F’s strategy in a similar way: 
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y=(y, a {p.(U”, V") } cates, U''eK(U), viteK(v))« 


The pay-off in the situation (y, y) is defined as follows: 
H(q, ») =H? (Qo, Po), = O=EUN. 


Definition 10. The strategy y* of player P is called successful if, given any E’s strategy y, 
sup t(g*, p) <<, 


{>} 
The concept of a stationary strategy can be introduced in a similar way to that in Sec. 1. 


Now assume that the pay-off function is Lipschitz in the set in which it is finite. Then we 
have: 


Theorem 6 


If player P has a successful strategy in the game I'(xz,, U, V) then there are e-equilibrium 
situations in the game for any e€ > 0, in stationary strategies. 


Proof. Instead of the game I'(xo,U, V )it is sufficient to consider the game [=<t(-, -), ©’, 
> .Since, by hypothesis, the pay-off function is Lipschitz, we can regard the game as one with a 
Lipschitz terminal pay-off function, specified in the terminal set M. But then, all the arguments of 
Section 1, with minor modifications, are applicable to the game, i.e., all the propositions of 


Section 1 hold, including the theorem on the existence of an equilibrium situation in stationary 
strategies. 


Translated by D. E. Brown 
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ON A CLASS OF MULTI-STAGE PROBLEMS OF 
STOCHASTIC OPTIMAL CONTROL* 


E. M. BERKOVICH 
Moscow 


(Received 27 June 1975) 


A CLASS of multi-stage stochastic optimal control problems, involving ordinary differential 
equations, is described. The replacement of the initial problem by finite-difference analogues is 
justified. A method for solving the special class of multi-stage problems, linear in the phase variable, 
is described. 


Multi-stage stochastic extremal problems [1,2] describe familiar situations of decision-making 
in conditions of imperfect information, in engineering, economics, and other fields of human 
endeavour. In a number of previous papers (see e.g., [3] ), finite difference methods for solving 
two-stage stochastic optimal control problems have been described and studied. In the present paper, 
similar methods are considered for a class of multi-stage problems with close similarities to those 


considered in [4]. 


1. Formulation of the problem 


We consider a controlled dynamic system, whose motion (evolution) is described by ordinary 
differential equations. We assume that the phase trajectory of the system, and the expenses involved 
in‘the chosen control, depend on a random “state of nature” with known probability characteristics, 
but with a realization which is unknown at the start of the motion. During the motion, additional 
information arrives, regarding the realizations of certain random parameters, connected with the 
dynamics of the system. The instants of arrival of the additional information split the total time of 
the motion into several stages, differing in the information pattern for the control selection. At the 
first stage no additional information is known and the control is sought as a determinate function of 
time. When selecting the control applied to the system in the subsequent stages, all the information 
which has arrived at the start of the stage is taken into account. As an estimate of the expense 
involved in the chosen control we take the expected value of the target functional under the condition 
that the controls applied to the system in the subsequent stages are optimal. It is required to choose, 
under these assumptions, the optimal controls of each stage. 


Let us state the problem formally. Given the complete probability space(Q, %, P). The 
elements @= are interpreted as the random “‘states of nature”. The time interval 7><t<7'y of 
the system motion is assumed to be given. The system state at any instant ¢=[ 7, 7'y Jis described 
by the m-dimensional vector x(t), while the control applied to the system at this instant is described 
by the r-dimensional vector u(t). We assume that, given any r-dimensional vector u = u(t), measurable 
in the interval [7), T;y] , and given any state of nature @=Q ,a unique phase trajectory z(¢; u, 
wo), t=[7o, 7], of the system is defined; the trajectory is in fact an m-dimensional vector function, 


satisfying the equations of motion 





*Zh. vy chisl. Mat. mat. Fiz., 17, 1,52—63, 1977. 
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4(t; u, ©) =f(x(t; u, w), u(t), t,o), 


te[T, Tr], x(Ty; u, ©) =2%(@), 


where f(z, u, t, @) andx»(@) are given m-dimensional vector functions. 


The controls have to be measurable functions of time, whose values belong at every instant 
to a given set MR’ For a fixed state of nature @={2 the expense of the chosen control u=u 
(t), t=[7o, Tx], is estimated by the number / (x(7y; u, w), @), where F(x, w) isa given 
scalar function. 


The movement time interval is divided by the fixed points 7,><7,< ...<Ty_,<Tyinto 
N stages ';=[T7;-1, T:], i=1,2,..., N. The points 7; i= 1,2,...,N— 1, define the instants 
of arrival of the additional information. The information arriving at the instant 7;, 1<i<N—41, 
is the realization b; of the random q-dimensional vector quantity, which depends in general on the 
realized state of nature w and on the applied control. In particular, b; may be the result of 
measurements performed with a random error, on the state of the system at the instant 7; (cf. 
problems of combining control and observation [5, 6, 7] ). On the other hand, b; may characterise 
the part of the set 2 in which the realized value w lies, so that a knowledge of b; improves the 
degree of information about the realized state of nature. 


In short, the realization of the block random quantity b'=(b,,..., 9:)is known and can be 
used in selecting the control, at any (i + 1)-th stage, 1 <i<N—1. 


Denote by U;, 1<i<N, the set of admissible controls of the i-th stage, i.e., the set of all 
r-dimensional vector functions u;==w;(t) , measurable in the interval I’; and satisfying the 
inclusions u;(t)=M, t=I;. We introduce the notation v'=(u,,..., Wi)for the collection of 
controls applied in the first i stages, 1 <i < N. Such a control will be called admissible if its 
component controls at each stage are admissible. 


Consider the problem of choosing the optimal controls at the individual stages. We start 
with the last stage yy. The control of this stage is chosen for fixed controls uV—! of the previous 
stages and fixed information b’—! arriving at the start of the stage. As an estimate of the expense 
involved in the control u ne of the last stage, we take the expected value of the target functional, 
ie., the quantity 


In (uy; wX—', b*—*) =E,,* *F (x(Ty; u®, @), @), (1.2) 


where u¥=(u*~', Uy), Eqpy-s is the operator of conditional mathematical expectation. The 
problem of choosing the best control of the N-stage for fixed wV—! and bV—! consists in minimizing 
the functional (1.2) with respect to uy in the set Uy. The quantity 


Iy’(u*-', b*—)= inf In (uy; u*-', bY" 
aaa Mier Sas (13) 


N 


is the estimate of the expense involvedin the optimal control at the N-th stage for fixed 4%-' and 
. 


We consider any i-th stage I';, 2<i<N—1. When choosing the control of this stage, the 
controls u!—! of the previous stages are fixed, along with the information b!~! arriving at the start 
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of the stage. Assume that we know, from the solution of the problem of the (i + 1)-th stage, the 
estimate J;,, (w', b') of the expense involved in the optimal control at the (i + 1)-th stage for fixed 
u'=(u'~', u;), b'=(b'*, b;). Since the realization b; at the i-th stage is not known, as an estimate 
of the expense involved in the control u; for fixed '-' and b'~',we take the quantity 


I; (ui; a. wy =E, lo!- dive (u', b‘) . (1.4) 


The optimal control of the i-th stage must minimize the functional (1.4) in the set U; for fixed 
u'-', b'-1,We put 


EL’ (a, B-*) = inf I,(a;; w'—', d-*), 


u,eU, 


In particular, from the solution of the problem at the second stage we find the estimate 
I" (U1, b,)of the expense involved in the control of the first stage u, and the information b, at 
the instant 7, under the proviso that the controls at the subsequent stages be optimal. The problem 
of choosing the optimal control of the first stage, when no additional information has arrived, 
consists in minimizing the functional 


I, (uy) =E,,/2" (us, 41) (1.6) 


in the set U,. Here, Ey, is the operator of unconditional mathematical expectation. The number 


I* = inf 7,(u,) 


u1eui 


= inf E,, inf | Pee i ae inf E, 4g 


u1eUi4 u2zeU2 u n—-1€U N-1 


X inf Eopr-F(x(Ty; u*, o), ) 


uve U N 


characterizes the mean expense involved in the optimal control at each stage. The collection of these 
parametric extremal problems for each stage is called the V-stage problem of stochastic optimal 
control. 


The feature of the problem for each stage is that the functional is in general specified 
implicitly. The problem admits of a variety of generalizations and modifications (see e.g., [4] ). 
Notice in particular that, if the information arriving at certain instants can contain errors as a result 
of purposive activity of an “opponent”, then the operators of mathematical expectation in the 
problems of the respective stages have to be replaced by lowest upper bound operators (cf. the 
principle of guaranteed result [8] ). 


2. Convergence of the difference approximations 


Let us construct finite-difference analogues of the N-stage problem of stochastic optimal 
control. For every sufficiently large integer n, n=>n*=—const=N, we consider a mesh, consonant 
with the N-stage property, in the interval [T), Ty], with base-points 7)>=tno<...<tn»=Ty, the 
points 7;, i=0, 1,..., N, being included in the base-points: 7;=t,,,,, i=0,1,..., N. The 
mesh base-points divide the interval [T), Ty] into subintervals of length t,j;=tn, j+1—tnj, J=Or 
1,...,”—1.The sequence of meshes is assumed to be canonical [9], i-., 
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t. = max tyj=O0(n-"*) 


0<jxn-1t 


The difference analogue of the control of the i-th stage, 1 <i <N, is the r-dimensional mesh vector 
function v(;),= (Uni, Aa! aarer | UniSR’, Ni. k<n;—1. The mesh control of the first i 
stages can be written in the block form v,,' = (0(1),,-- + »¥(i)), a8 a collection of difference 
controls for each stage. 


The difference analogue of the phase trajectory, corresponding to the mesh control 
Un»=Un*=(Unoy.. +5 Un, n—1) and a fixed state of nature m=, is an m-dimensional mesh vector 


function, whose components z,,;(V,, @), 7/=0,1,..., ,Satisfy the equations 
Ln, j41(VUny O) =Lnj(Un, ©) FTnjf(Lnj(Va, @), Vas tnj, @), 

(2.1) 
j=0,1,...,n—1, Lno(Un, @) =2(). 

We denote by U;,,, 1<i<N, the set of admissible mesh controls of the i-th stage, i.e., the mesh 

vector functions V;i)n=(Unn,,,+++;Un,n,-1), Whose components satisfy the inclusions v,,=M, 

k=n,_,,..., ne— 1. 


The difference analogue of the problem at the V-th stage consists in minimizing, for fixed 
v'-tand b*~* the functional 
Inn (Vin); oer p*-*) =EoyyiF (San (e,", o) ? o) ? 
(2.2) 


G Ni 


Vn —< (v, ) Ucnyn); 


with respect to UyyynEUyn. We put 


Inn’(U2 O"—*)= inf Tya(Venyny Va, 0"). (2.3) 


%nyn =U Nn 


Assume that the difference problems for the V-th, (V—1)-th,...,(@+ 1)-th stages, 2<i<N—1. 
have already been defined. The difference analogue of the problem at the i-th stage is the extremal 
problem on minimization, for fixed pi and 6*-! of the functional 


ii ag . i 
Tin (Vciyn} Vn , 5 ) = Ey jot Digan (Un', b’), 


a 
Byte) b' =(b'', bi), 


in the set U;,,. We put 


* Lo ; . F=f, S as 
Lin (v, * bi!) = inf Lin Vciyn} Vn , 0 ys 


% (in =U in 


The difference problem at the Ist stage amounts to minimizing the functional 


Tin (Viryn) =Eslon® (Vay ny b,) (2.6) 


in the set Uj,,. The collection of the extremal problems for each stage forms a difference N-stage 
problem of stochastic optimal control. The quantity 


i” — inf Tin (Veayn) (2.7) 


%(1)n SU in 
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is the difference analogue of the mean optimal expense estimate (1.7). 


Notice that the construction of the difference analogue of the Cauchy problem (1.1) on the 
basis of the Euler scheme (2.1) is chosen merely to achieve a clear-cut and convenient treatment. 
While the use of more exact difference schemes does not alter the fact proved below, of 
approximation with respect to the functional, it may increase the rate of convergence. 


We shall say that the sequence of multi-stage difference problems (2.1)—(2.7) approximates 
the initial multi-stage problem of stochastic optimal control with respect to the functional 
(cf. [3, 9]), if /*,, > J* as n > ©, Let us state some assumptions about the initial data of the problem 


posed in Section 1. 


Assumption 1. The set M is a bounded, convex, and closed subset of R’. 


Assumption 2. The function f(x, u, t, w) satisfies a Lipschitz condition with respect to 
x, u, t, uniformly with respect to ge ;and for fixed x, u, ¢, it is, like xp(w), a bounded 


measurable function of me. 


Assumption 3. The function F(x, w) is measurable with respect to me&Q a uniformly 
continuous and bounded function of x in every bounded subset of R”. 


The sufficiency of the conditions stated, for approximation of the initial multi-stage problem 
with respect to a functional, is proved by: 


Theorem 1 


Let Assumptions 1—3 hold. Then, the sequence of difference multi-stage problems (2.1)—(2.7) 
functional-wise approximates the multi-stage problem of stochastic optimal control posed in 
Section 1. 


The proof will be carried out in several steps, involving a number of auxiliary propositions. 


Lemma 1 


Let Assumptions 1—3 hold. Then, given any i= 2,3,...,, the functional /;(u,;; u'~*, b’~*), 
defined by Eqs. (1.2)—(1.5), is uniformly continuous in the norm of space L."([7,,7;]) in the set 
of admissible controls w'=(u'—', u,;) uniformly with respect to b'—1. In addition, the functional 
I, (u,) of (1.6) is uniformly continuous in the norm of L4’(T,) in the set Uj. 


Proof. Notice first that, by Assumptions 1 and 2, the trajectories x(t; u, w), t=[7o, Tw, 
of the problem (1.1) define, uniformly with respect to o=2 a uniformly continuous mapping of 
the set of admissible controls into the space of m-dimensional vector functions C”([7,, Ty]). 
continuous in [T, Ty] . In view of this, and Assumption 3, the functional /y (Wy; w*~', bY") of 
(1.2) is uniformly continuous in the set of admissible controls u* =(u*~*,wy) uniformly with 


respect to bV—-1, 


We now use induction. Assume that, for some i = 2,3,..., N—1, the functional /;.,(Ui.4; wv’, 
b') is uniformly continuous in the set of admissible controls w'*'=(w', w,,,) uniformly with 


respect to b!. We shall show that the functional J;, (u', b*) is then also uniformly continuous in 
the set of admissible controls uv’, uniformly with respect to b!. Let u! and u' be admissible controls 
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of the first i stages, and let € be an arbitrary positive number. For each b! there exist controls 
satisfying the inequalities 
das (Bigs; 2’, 6°) — A (u', b')< e/2, 


Digs (igs; a’, b')— 1,4, (a', b') < &/2. 6s 


In addition, in view of the uniform continuity with respect to u‘t'=(u', wis1) of the functional 


Ti4(Uiz4; wu’, b') there exists 6 > 0 such that, if 


lu'— 7 Il rccr, 7) <4 (2.9) 


we have, for all b’, the inequalities 
[Ties (Wiss; a’, 0°) —Tias (Wiss; @, 0’) |<e/2, 


[Tiss (Wigss w', b°) Liss (Wins; @, b') | <e/2. 


It follows from (2.8) and (2.10) that, when (2.9) holds, we have, for all b!, 
Lae (u', b‘) xn iiss (%", b') |< E, 
and this last inequality shows that the functional /;,,(u‘, b'). is uniformly continuous. 


Applying the operator of conditional mathematical expectation to (2.11), we can prove the 
uniform continuity with respect to u'=(u'~’, u,) of the functional /;(u;; w'~', b'-'), uniform 
with respect to b!— 1. This gives us the first part of the lemma. In particular, the functional 
I,(U2; W,, b,) is shown to be uniformly continuous in the set of admissible controls u’=(u,, ue), 
the continuity being uniform with respect to b, . Hence it follows in turn that the functionals 
I," (uy, b,) and I,(u,). are uniformly continuous with respect to ukx=U, Lemma 1 is proved. 


For anyi=1,2,...,Nand any mesh control v,i),=U:, we denote by P,,V;i)n its piecewise 
constant continuation into the interval I’;. Obviously, P,,v;;;,=U;.For any block mesh control 
Vn'=(Viiynye ++ Vein) We put Ppv,'=(P Vpn, ---, Pnvyn). From assumptions 1 and 2 and 
the difference analogue of Gronwall’s lemma, we obtain (cf. [9] ). 


Lemma 2 


Given any @=Q and any sequence of admissible mesh controls v,,*, n=n*, we have 


max [x (tni; Po.” o)— rw (aa o) | +> 


asn > 0° seit 


Closeness of the continuous and difference phase trajectories implies a definite degree of 
closeness between the values of the corresponding functionals. 


Lemma 3 


For all i= 2,3,...,N, any value of b!—!, and any sequence of admissible mesh controls 
of the first i stages v,', m2=n*, we have the inequalities 


lim {7;(Pa¥¢iynj Pan, 0°!) — Lin (Vgiyny Vn DI") } <0. (2.12) 


nlh-> co 
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In addition, given any sequence of mesh controls of the first stage Vi1yn=U in, n=n*, we have 


lim {7 (Pav yn) — Lin (Veryn) } <0. (2.13) 


n-~> co 


Proof. Notice that, by Assumption 3 and Lemma 2, the inequality (212) holds for i= N. Let 
us show that, by virtue of (2.12), we have 


lim (I; (Pap, b-") — Tin’ (V.', BO) }< 0. (2.14) 


nm-> co 


In fact, for any b'—! there exists a sequence of mesh controls v,;),@Uin, n=n", for which 


lim {Zin (V¢iyn3 Un -*)— Tin’ (Vn, B'-*)}=0. (2.15) 


n> oo 


Since Tin’ (Ua, 0'~") Jin (Viyn} Va O'~*), for all n, we obtain (2.14) from (2.12) and (2.15). 
It follows from (2.14) and Fatou’s lemma that relation (2.12) remains valid when i is replaced by 

i — 1. Hence (2.12) is proved for all i= 2,3,...,N. Hence, in the light of what has been proved, 
(2.14) holds for i = 2; and in turn, this implies that (2.13) holds. Lemma 3 is proved. 


A particular consequence of Lemma 3 is the inequality 


lim {I° —1,"}<0. 


To prove Theorem 1, it now only remains to show that 


lim {7," —I}<0. (2.17) 


l-> co 


For the proof, we require two further lemmas. Let us first introduce some notation. Given any 
continuous function u;=U; we denote by Q,,Wu;=v,;), the mesh control for the i-th stage, 
representing the “projection onto the mesh” of the control u;, ie., Unz=Wi(tne), K=Mi-s,.-., 
n—1, 1<i<N. 


Obviously, Q,u:GUin. For the continuous control of the first i stages u'=(u,,..., Wi) we 
put Q,w'=(Qny,..., Oni). We can prove the following in the same way as in [9]: 


Lemma 4 
Given any @=2 and any continuous admissible control w=u*(t), T><t<T7y, we have 
max |2,;(Q,u%,@)—2(tri; u%,o)|>0 as n>, 
0<iggn 
Using a similar method to that when proving Lemma 3, we can prove from Lemmas | and 4: 


Lemma 5 


For all i= 2,3,...,, any value of bi-1, and any admissible continuous control of the first 


istagesu'= (U4,..., Ui) we have 


lim {Jin (Qnwi: ar. o*) —T; (uj; et" b'-")}<0. 


> 
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In addition, for any continuous function u,=U, we have 


Tim {Zin (Quits) — 1, (u,)}<0. (2.18) 


n-> co 


Since, by Lemma 1, the functional /, (uw, ) is uniformly continuous in the set U, , and by 
assumption 1, the set M is convex and closed, then it can easily be shown, in the same way as in 
[9] , that a sequence of functions, continuous in I’; , exists, minimizing the functional /,(u,) in 
the set U, . In view of this and (2.18), we obtain the inequality (2.17), which, jointly with (2.16), 
is equivalent to the equation lim /,"=/*, hence Theorem 1 is proved. 


n> Co 


Notice that, under certain assumptions about the smoothness of the optimal controls of the 
initial multi-stage problem, we can guarantee the convergence rate estimate |/,,°—/*|—=O(t,) as 
noo, We can arrange for a higher rate of convergence by using difference analogues, more exact 


than (2.1), of the Cauchy problem (1.1). 


The approximating difference problems enable us to construct minimizing sequences for the 
initial problems of optimal control for each stage. Particular examples of these sequences are the 
piecewise constant continuations of the mesh controls, which more and more exactly solve the 
relevant difference problems. Multi-stage stochastic problems, like determinate optimal control 
problems, may belong to the class of ill-posed variational problems [10]. If it is necessary to 
construct sequences, convergent to the optimal controls, the difference approximating problems 
can be subjected to Tikhonov regularization (cf. [3,9]. 


3. Difference multi-stage problems, linear in the phase variable 


We shall consider the special class of multi-stage problems, for which the equation of motion 
and the target functional depend linearly on the system phase state vector. In addition, we shall 
assume that the equations of motion for each stage are completely defined by the additional 
information arriving at the start of the stage. In the case described, the problem for each stage 
amounts to solving a sequence of relatively simple auxiliary extremal problems. 


Let the initial state of the system be a given determinate vector xg = X,,,- In the present section, 
the difference mesh with respect to the time axis is assumed to be fixed, and to simplify the notation, 
we shall omit the auxiliary index n, indicating the number of the mesh base-points, in the phase 
state and control vectors, and in the functionals and sets. 


Assume that the difference analogue of the equations of motion of the system at the i-th 
stage, 1 <i <A, for a fixed value bi-1 of the additional information arriving at the start of the 


stage, is 


Tng1=A,(b-") 2, +B, (vs, O'), kM... MA. (3.1) 


Here, A,(b'~') are given (m X m) matrices, and B,(v,, b‘~*) are m-dimensional vector functions. 
To unify the notation, we write b° for a fixed (e.g., the zero) vector. 


At the first stage, for m,=O<k<n,_ the matrices A; in (3.1) are determinate, while the 
vectors B, depend only on the control v,. At the second stage, for ny<k<n, they depend on the 
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random quantity b,, the realization of which is known at this stage, etc. For a fixed initial state 
xnj—1 Of the system at the i-th stage, and a chosen control v,;)= (Ung_» +++» Unj-1) we can find 
from Eqs. (3.1) for every vector b'—! the unique difference phase trajectory 1,=2,(UV(), Ln;_. 
bi!) k=n;_,,..., mi. The function F(x, w), defining the target functional, is also linear in x: 


F(z, ©) =(a(@), x) +8(o), 


where a(w) is an m-dimensional vector quantity, and 6(c) is a given scalar. 


The special feature of the present situation lies in the fact that the extent of the information 
of the person choosing the control, at any given stage, about the controls of the previous stages, 
amounts to exact specification of the phase state of the system at the start of the given stage. For 
instance, at the last (V-th) stage, the system state x,,,,_ at the start of this stage is known, and so 
is the supplementary information bY —! arriving during the motion. The difference problem of the 
N-th stage amounts to minimizing, with respect to viv) =U y for fixed Zny_, and b*~' , the 
functional 

Ty (Uy; £ b*-*) = (ay(b*—"), 2a, ) + Bw (0%), (3.2) 


iy 4 


where 2n,~=Zny (Yin), Try, b*~*), while the m-dimensional vector function q, (b¥-') and the 


scalar function 8,(b%~‘) are given by the relations 


ay (bX) =Eoyxi(@), By (b*~") =Euyy- . (3.3) 


In'(a, oO )= int 


%(N) eUy 


RNa 


At any i-th stage, 1 <i <N — 1, the initial state of the system x, and the input additional 
information b’—1 are known. The optimal control of the i-th stage minimizes with respect to 
v(i)=U; for fixed t,,-. and b'~’ , the functional 


I; (Vciy} Zn 4 ’ orn} = Ey , 0! Pai (Zn, b‘), (3.4) 


where Ln, = Xn, (V¢iy, Zn ,_,, 0°*), while a (z,,, 5°) is the lower bound of the functional in the 


problem of the (i + 1)-th stage. 


We put 
[;" (% 


i 


po y= inf L,(Uqw; 7h 


(3.5) 


In particular, the number /*=/,*(z,,, 6°) =J,"(2o, 6°), plays a role similar to that of (2.7), 
namely, the role of mean estimate of the expense in the multi-stage difference problem. 


Let us introduce some notation. For every i= 1, 2,..., Nand for an arbitrary vector pi-1, 
we denote by A} (b'"')ER™, k=nj-s,..., mi, the solution of the following “conjugate system” 


of the i-th stage: 


dat’? (bi-*) = — a: (B), 
(3.6) 


de (B= A(B Yaa (D4), hemi, iA. 
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Here, T denotes matrix transposition, the m-dimensional vector function «;(b'~*) is given, for 
i= N, by the condition (3.3), and for the other i= 1,2,...,M— 1, is defined by the recurrence 
relation 


on (b'-")= — Ey ptdn,  (b'), t= 4,2,...,N—4. (3.7) 


We shall assume that, for every i= 1,2,...,N, and any b’—!, there exists a mesh control v,;) 
(b'-') = im: sfoay Dass) = U;, whose components are the solutions of the following supplementary 
extremal problems: 


ye? (b-*) = max (Angr (b'-), Ba(vn, O)), R= Mi, Mi — 1. (3.8) 


v,EM 
We also define the scalar functions B;(b'~'), i=1, 2,..., MN, by the relations 


Nj4yt 


B(O'") = Eo,w[Be(O— YY vO) ], i= 1,2,...N—1, G9) 


k=njz 


where the functions y{)) (b'~') are defined by condition (3.8), and By (6*~-"') by condition (3.3). 
It is assumed throughout that the application of the operators of mathematical expectation is valid, 
though this can be proved under fairly natural assumptions. 


In problems for which the arrival of the additional information narrows the domain of 
possible realizations of the state of nature, the random quantities b; as a rule have discrete distrib- 
utions with a finite number of possible realizations. In such cases, the operations of finding the 
mathematical expectations can be performed in a particularly simple way. 


Theorem 2 


Under the above assumptions, there exists, in the problem of any i-th stage, 1 <i <N, an 
optimal control which is independent of the controls of the previous stages, and is determined solely 
by the supplementary information b’—! arriving at the start of the stage. Such an optimal control 
is, in particular, the mesh function v/,, (b'~*) , whose components solve the auxiliary extremal 
problems (3.8). 


Proof. We shall first show that, if the functional of the i-th stage problem is 
L(v¢s); Bn,_, O°") = (ae (O°), Za, ) FB (O°*), (3.10) 


where Zn = Zn ,(Vi),n ,_,, Oi-1), then its minimum with respect to Vi)=U; for fixed rn ;_, 
and b'-' is achieved on the mesh control vj, (b'~*). Using (3.6) and (3.1), with arbitrary v,,, = 
{eee Vn,-1) we can transform (3.10) to 


1 (0:4; Ba, 8 — Oe, AO), ea) 


nbs | 


nj—t (3.11) 
+B:(B 1) — YY (Anes (0), By(v, BY), 


k=ny_, 


It in fact follows from (3.11) that the functional (3.10) is minimized on the control Vii) (b'~*) where, 
by (3.5) and (3.8), 
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Te (a, BY = — Ong (O*), Bae) t BAB) — VP (O4). GB.) 
k=nj-4 
It remains to show that the functional of the i-th stage problem, 1 <i </N, in fact has the form 
(3.10). With i = N, (3.10) holds by virtue of (3.2). We then argue by induction. Assume that (3.10) 
holds for some i, 2 <i <N. Then, by (3.4), (3.7), (3.9), and (3.12), the representation (3.10) also 
holds for the functional of the (i — 1)-th stage. Theorem 2 is proved. 


In the case when the number of possible realizations of 5; is finite and not unduly large, 
Theorem 2 provides the basis for the following method of solving the multi-stage problem posed in 
the present section. In advance of starting the motion of the system, at the stage of processing the 
a priori information, the conjugate systems (3.6) and the auxiliary extremal problems (3.8) are solved. 
As a result, the controls V5 (b'-') will be constructed for all possible values of b'-', i=1, 2 
This preliminary stage can prove to be lengthy and laborious, though not particularly high standards 
are demanded at this stage concerning the operational properties of the choice of controls. Then, 
during the motion of the system, the operating side can quickly react to the arrival of the additional 
information by choosing the previously calculated appropriate control; these tactics prove to be 
optimal in the mean when an operation is repeated with sufficient frequency. 


Translated by D. E. Brown 
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AN EXISTENCE THEOREM IN A MINIMAX CONTROL PROBLEM* 
N. S. VASIL’EV 
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A PROBLEM posed by N. N. Moiseev is considered; it can be treated as an application of the 
principle of the maximum guaranteed result when undetermined factors are present [1] . Necessary 
conditions for optimality, in the form of a Pontryagin maximum principle, have been discussed on 
several occasions (see [2—4] ). In this connection, it becomes necessary to obtain existence theorems 


in problems of this kind. 


1. The problem 


Given a controlled process with a parameter 
dz / dt=f(z, u(t), v), (1) 


where f(x, u, v) is a vector function, bounded in the set XXPXQc E"*?*4;t is the time, t>0; 
x=(2,,..., Ln) is the phase vector, lying in the set X of space E”; u(t) =(u,(t) 

is the controlling vector function with values in the set PCE”; v=(W,..., v,) is a parameter, 
belonging to the set Q C FY. 


We fix an arbitrary vector x°=X and an arbitrary positive number T. 


Definition. The control u(t) is called admissible if it is Lebesgue measurable in the time 
interval [0, 7], takes values from the set P, and for all parameter values from the set Q, there exists 
a solution of the given system of differential equations, defined in the interval [0, 7] with the 
initial condition x°, the trajectory of which lies in the set X. 


Let Q denote the set of admissible controls. We assume that 2 is not empty. Then the 
following functional is defined in the set of solutions of the system of differential equations, 
corresponding to controls of 2: 


J(u(t),v)= [PF @O,u@), vat, (2) 


where the integrand is defined in the set XY X P X Q. The question arises as to the conditions in 
which an admissible control exists, on which the minimum is reached in the expression 


inf sup J (u(t), v). 


Q veg 


Whatever admissible control is fixed, it is easy to arrange for the functional to reach its maximum 
with respect to the parameter. 
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Proposition. If Q is not empty, X X P X Q is acompactum in —"*?*", and the right-hand 
side of the system and the integrand are continuous with respect to their sets of variables in the set 
X X PX Q, then, given any u(t) of the set Q, 


max J (u(t), v) 


veQ 


is reached for some value of the parameter. 


Proof. We fix an admissible control u(t) and show that the functional is continuous with 
respect to the parameter in the compactum Q. 


Since X is compact, the set of solutions z,(¢), k=1, 2,..., of the system of differential 
equations with the parameter v,, k=1, 2,..., is uniformly bounded, and in the light of the 
inequality 


|r, (t2) —2,(t,) |< j lf (a, (t), w(t), v,) [dt 


i 


< max |f(z, u,v) |lt.—4| 
XXPXQ 


fork =1,2,..., is equicontinuous. By the Ascoli—Arzela theorem, the set {z,(t), k=1, 2,...} 
is relatively compact in the space of continuous functions, and it can therefore be assumed that 
x(t) are convergent in a continuous metric to a function X(r). Since the set Q is compact, we can 
assume that the v, converge to the parameter v, belonging to the set Q. 


Since the right-hand side of system (1) is continuous, for all t we have the convergence 
fi(tn(t), w(t), vi) >fi(%(t), u(t), 0) as ko, i=1, 2,..., 7, 
and all the terms of this sequence are uniformly bounded by the constant 


max |f;(z, u,v) |, 
xXxXPXQ 


Then, by Lebesgue’s theorem on passage to the limit under the integral sign, we get 


4, (t) ==2° + Ji(za(x),a (t), Vv,)dt—>z (t) =2° + { f (Z(t), w(t), 2) dt. 


0 


Hence x (f) is a solution of the system with the parameter v. 


For a similar reason, the right-hand side of the inequality 


|J (u(t), vr) —J (u(t), 2) | 


> sd 
<f If (a(t), w(t), vs) f° (@(), w(t), 9) lat. 
tends to zero. Hence the proposition follows. 


Since the class 2 is not in general a compact set, the minimum may not be reached in the 
expression 


inf max J (u(t), v) 
g @ 
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for smooth functions in (1) and (2), and compact bounded sets P and Q. 


Example. dz / dt=x(vu(t)+(1—v) (4—w(t))’), x(0) =14, 
0<t<i, P=Q=[0,1], J(u(t),v)=] (x(t) —e'/)? dt. 


0 
The set 22 is the same as the set of Lebesgue-measurable functions, taking values in the 
interval [0, 1], since the solution may be continued even indefinitely on such controls. 


Given any admissible control, the functional reaches its maximum with respect to the 
parameter. This follows from our proposition, if we note that the phase variable does not leave a 
compact interval of the straight line R, no matter what the admissible controls or the parameter 


(see the corollary to the theorem). 
Let us show that, in our example, we have 


inf max J (u(t), v) =0 


Q vEeQ 


and that there is no admissible control realizing this. 


We take the following sequence of piecewise constant controls, taking only the two alternative 
values 0 and 1: 


0<t<1/2n, 
1/2n<t<1/n, 
1/n<t<3/2n, 





{—1/2n<t<1, 


We can write the parametrically dependent sequence of solutions of the differential equation 
z,(t, v), n=1, 2,..., corresponding to these controls, in the explicit form 


1 21+4 
(-D)+-(4—¥) i. prateleae 


n 2n 


Me 


exp 


2l+1 +1 
(+1) +00], <t< ; 
2n n 


(14—2v) 


Bi | 2n 


z,(t, v) =e'*h,(E, v), 


{ I = a 
h exp| —( n ‘ 2n 
alto) Ly | 2H et 


ot 


2n n 
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From the inequalities e~'/‘"<h,,(t, v) <e*/*", n=1, 2,..., there follows the convergence, 
uniform with respect to time and the parameter, of x,,(t, v) to the function etl2asn> co, whence 
we obtain Eq. (3). 


Assume that an admissible control u(t) exists, for which 


max J (u(t), v) =0. 


veEQ 
This can only occur under the condition z(t,v)=e'/’ for all values of the parameter and of 
time, where x(t, v) satisfies the equation with the control u(t). Hence, substituting the function 
e'/2 in the differential equation, we arrive finally at the identity 


e'/?/2=e'? (vu(t) +(1—v) (1—u(t))’), 
v[ w(t) —'/2]+(4—v) [ (4—w(t) )?— 3/2] =0, O<t<1, OXv<1. 
On taking the parameter equal to 0 and 1, we find respectively that the following mutually 
contradictory equations need to be satisfied: 
(14—u(t))?='/. and u(t) ='/>. 


Note. Given any fixed value of the parameter v<[0, 1]there exists an admissible control on 


which is realized 
min J(u(t), v)=0. 


u(t)eEaQ 
The following is an example of such an optimal control: 


(5v2—6v+2) "2 





c 


2(1-v) 2(1-—v) 
2-—3v (5v?—6v+2) 2 
| 2(1-v) S4-»)- 


u(v, t)= 


<v<'!/,, 





Let us show that u(y, f) is an admissible control. It is easily shown that the control is 
non-negative. For 0 <v < ¥, from the inequalities 2—3v<2(1—v) <2(1—v) + (5v?—6v+2)"* we obtain 
u(v, t)<1, For 4<y <1, the fact that u(v, t) <1 is easily seen from the estimates 


u('/o, t)=4, u(4, t)='/2<1, 


du(v, t) 2v—1—(5v?—6v+2) 2 





£ # 0, 1/,<v< i, 
Av 2(1—v)?2(5v?—6v +2) "2 


Thus, u (v, ¢) is an admissible control, and in view of the equation J(u(v, t), v)=0 the 
control is optimal, since J(u(t), v) >0, u(t) <Q. 
2. Existence theorem 


Let us give the conditions for the existence of a solution in the minimax control problem 


stated above. 


Theorem 


Let the right-hand side of the system (1) have the form f(z, u, v) =C (z,v) utd (Z,v) , where 
the components of the matrix C(x, v) and of the vector d(x, v) are continuous, along with their 
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partial derivatives with respect to x, in the set X X Q. We assume that X, P, and Q are compact. 
In addition, P is a convex set. We assume that the integrand in (2) is continuous in X X P X Q, 
convex with respect to u in the set P for all fixed (x, v) =XXQ_ and satisfies a Lipschitz 
condition with respect to (x, 7) =XXP with constant L for any fixed vO. 


If the set Q is not empty, then the expression 


min max J (u(t),v). 
uu Q 


reaches its minimum on some admissible control. 


Proof. We shall first show that the set Q is weakly compact. We fix an arbitrary parameter of 
the set Q. 


If we take the sequence of admissible controls u,(t), A= 1,2,..., weakly convergent 
to the measurable function 7 (f), then it is sufficient to show that u(t) belongs to the set Q, since 
the set of all measurable controls with values in a convex compactum is weakly compact [5]. This 
means that u(t) takes values from the set P. The set of solutions of the system of differential 
equations x;(t) corresponding to the controls u;(t) is relatively compact (see the proof of our 
proposition). Hence we can assume that the sequence x;(t) is uniformly convergent to a function 
x (t). Using this fact, along with the weak convergence of the sequence u;(t) to u(t), and the 
linearity of the system of differential equations with respect to the control, we obtain 


x,(t)=2" +f [C (xy, v) utd (2,, v) |Jdt>2° 


0 


+ J icc, v)atd(z,v)]dt,  k+o, 


lim x, (t) =z (t), t=[0,7]. 
hoo 
The last equation implies that x(t) is the solution of the system of differential equations with 
the control 7(t) and any fixed parameter v. Since the uniform convergence to x(t) holds in the 
interval [0, 7] , and the set X is compact, then x(t) is defined in the same time interval and does not 


leave X. 


For any fixed value of the parameter v, we shall show that the functional is weakly lower 


semi-continuous with respect to a control of 2. 


Let u(t), k=1, 2,...,be the sequence of admissible controls, and x,(t), k=1, 2,... 
the sequence of corresponding trajectories, weakly convergent to u(t) and uniformly convergent to 
the solution x(t) of the system with the control u(t), respectively. Let us show that the auxiliary 


functional . 


I(w(t))= | PEO, ule), vat 


0 


is weakly lower semi-continuous in the set of Lebesgue-measurable functions with values in P. Since 
the integrand is convex in P, we can easily see that the functional is convex with respect to the 
control, since x(t) and v are fixed. The auxiliary functional is continuous with respect to convergence 
of functions in ZL, [0,7]. This follows from the inequality 
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Na) T@®)I< fF EO, 7H, »)-P EW, u,v) lat 


<LT" ({ | (t) —u (t) at)” 


The weak lower semi-continuity of the auxiliary functional is obtained by applying the following 
theorem of functional analysis: a convex functional is weakly lower semi-continuous in a convex 
set of Banach space if and only if it is lower continuous in this set (see [6] ). 


In short, we have obtained the relation 
lim I (u,(t)) 1 (a(t). 


k-> co 


Using the following estimate for the difference between the values of the initial and the auxiliary 
functional for the chosen sequence of controls: 


IJ (u,(t), v) —I(u,(t)) |< j lf? (2, Un, V) —f° (Z, Up, Vv) |e 


T 
<L J lz,-z|dt>0, k-o, 
0 


we obtain the equation 
lim J(u, (¢), v) = lim J(u, (t)), 


kh-> co k-oo 


while from the definition of the auxiliary functional we obtain the equation J (z(t) ) =J (u(t) ). 


On discarding the auxiliary functional, we get 


lim J(u (t),v)>J(a(t),v) Vv, (4) 


k-oco 


which implies that the functional (2) is weakly lower semi-continuous with respect to the control. 


Given any admissible control, the functional has a maximum with respect to the parameter 


(see our proposition). We shall show that 
max J (u(t), v) (5) 


is also a weakly lower semi-continuous functional in the set 22. In fact, if the sequence of admissible 
controls 17,(t), k=1,2,..., is weakly convergent to u(t), then, by what has been proved, 


Eq. (4) holds. Since ; 
max J(u, (t), v) >J(u,(t), v), 


tSQ 


we also have 
lim max J(u, (¢), v) > lim J(u, (¢), v) Vved. 


k— oo’ Q@ hkh-—+co 


On combining these inequalities, we see that 


lim max J (u,(t), v) > max J (u(t), v). 
h—-x Q Q 
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If, to the functional (5), defined in the weakly compact set 22, we apply the theorem of 
functional analysis, to the effect that a weakly lower semi-continuous functional reaches its 
greatest lower bound in a weakly compact set, we complete the proof (see [7] ). 


Corollary. The theorem remains true if the set X is replaced by the space E”, and the 
assumption that the set of admissible controls is not empty is replaced by the condition 


IC (x, v) |I<e,(A+]|z]), d(x, v) |!<e.(1+]2}) 
V (a, v) =E"*XQ, 


where ||\C(zx,v)|| and ||d(z, v) || denote the norms of the matrix C(x, v) and of the vector 
d(x, v), and c;, Cy are positive constants. 


Proof. Given any Lebesgue-measurable control with values in P, and any parameter of Q, the 
system of differential equations has a solution, defined in a certain segment (see [8] ). 


Satisfaction of the inequality (6) ensures that the solution can be continued into the interval 
[0, 7], so that the set 2 is the same as the set of Lebesgue-measurable functions in the interval 
[0, 7], with values in P. 


Let us show that all the possible trajectories, corresponding to controls of Q and parameters 
of Q, cannot leave a sphere of the space E”. It can be assumed that the inequalities (6) hold in the 
Euclidean norm. We denote by y(t) the Euclidean norm of the phase variable x(f). 


We then easily obtain the inequality 
dy’ (t)/dt=2 (x(t), C(x(t), v)u(t)+d(x(t),v))<2cy(Aty), 


where the constant 
Cy=c, max |ul+c, 


uecP 


and (g, f) denotes the scalar product of vectors g and f. 


After integrating the differential inequality, we obtain an estimate, independent of the choice 
of admissible control and parameter: y(t)<(1+]|z°|)e%’—1. It remains to use the theorem. 


3. Some generalizations 


If, instead of a parameter, we use measurable vector functions v(t) with values from Q, we 
can easily see that, if all the other conditions in the theorem are satisfied, we can assert that an 
admissible control exists, realizing 

min sup / (u(t), v(t)). 
Q v(t) 
If, at the same time, we consider a system of differential equations of the type dx/dt=C (x) u(t) 
+D(x)v(t) and require in addition that the set Q be convex and the function f(x, u, v) be 
concave with respect to v=Q_ for all fixed (2, w) =X XP, then we can prove in a similar way 
that there exist an admissible control u(t) and a function v(t), such that 
min max J (w(t), v(t)) 


u(t) wt) 
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is reached on these functions. 


This last result generalizes the result of [9] , where a problem in a similar formulation was 
considered. 


Translated by D. E. Brown 
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A METHOD OF EVALUATING THE STATIONARY POINTS OF A GENERAL 
PROBLEM OF NON-LINEAR PROGRAMMING* 


N. A. BOGOMOLOV and V. G. KARMANOV 
Moscow 
(Received 26 March 1976) 
THE USE of the method of feasible directions (mfd) for finding the points of local minima of a 
non-convex function in a non-convex set is considered. The successive approximations are shown 


to be convergent to the set of stationary points, and in particular, to the set of points at which 
the necessary conditions for a local minimum are satisfied. 


To find the points of local minima in a general problem of non-linear programming, the only 
realistic approach is to use a relaxation method in which, when finding the direction of descent from 
a point x, account is taken solely of the local properties of the function requiring minimization, and 
the local properties of the set in which this function is defined. Of the available determinate methods, 
the mfd satisfies these conditions [1—4]. 


Consider the problem of finding the minima of the differentiable function y(x) in a closed 
set X of n-dimensional Euclidean space. 





*Zh. vychisl. Mat. mat. Fiz., 17, 72-78, 1977. 
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Let X={x|fi(x)>0, i=1,2,...,m}, fi(x) be given differentiable functions. The 
mfd consists in constructing a sequence of points {x,} from the expression 2,;,=2,—B,S,. Here, 
x,=X_ is the point evaluated at the previous iteration, and —s, is a feasible direction at the point 
Xpsi .€., a direction such that small displacements along it from the point x; do not go outside the 
set X; and finally, let 6, define the step length. Here, s, and 8, are chosen in such a way that 


P(tn+1)SQ(%,), k=O, 1,.... 


We shall consider the auxiliary problem of finding the number o and the vector s such that 
o>max, 


<f/(2),s>+0<0, iel, —<q’(z),s>+0<0, (1) 


Cs, s><1. 


Here, <w’ (x), s) is the scalar product of the gradient of the function W(x) and the vector s. 


Let 6(x, ¢) and §(z, €) denote the solutions of problem (1) for /=/ (2, ¢) ={i: 0</f;(z) 

<e}, let € be a positive number, and let o(x, 0) and s(x, 0) be the solutions of problem (1) for 
[=I (x, 0) ={i: fi(z) =0}. In order for the direction —s, ||s||=41, to be feasible at the 
point «=X it is sufficient that o > 0 and s satisfy the inequalities </;’ (x), s>-+o<O for all 
i=/(z, €) for at least one €e > 0 (see e.g., [4] ). Let the direction —s be feasible at the point z=X. 
We define the distance ¢ from the point x to the nearest boundary point of the set X along the 
direction —s. Since the direction —s is feasible at the point x, a number B > 0 exists such that the 
point z—Bs=X for all B=[0, fj. The quantity ¢ = sup 8 (if it is finite) denotes the length of the 
maximum interval [x, x — ¢ s] , belonging entirely to the set X. Here, y =x — {sis the boundary point 
of the set X. If § = + 0, then the ray x—[s, BO , belongs to the set X. If z=z, and s=s, we shall 


write i = oh. 


Scheme of the method. As the initial approximation xg we can choose any element of the 
set X, while €g is chosen from the semi-interval (0, 1]. Assume that x, and €, have been evaluated 
as a result of the k-th iteration. Let us describe the (k + 1)-th iteration. 


Step A. On solving problem (1) for =I (x,, x), we can find the admissible o, and $x,||sxi|=4, 
such that 6, §,6 (22, &x), where O<EXE,<1. 


Step B. If o, > e,, we evaluate 6, . Usually, the 6, are evaluated by solving a problem of 
one-dimensional minimization. Then, 6; has to satisfy the conditions 
P (La—BaSn) <(1—An) P (Ta) +An@n, = OSA, 
(2) 


@,.= inf o(x,—fs,). 
ombal, 


The numbers ; can also be chosen as follows. Let 6; be the maximum of the numbers which 
satisfy the relations 


1 
(Ln) —@ (La— Basa) = 2 BaSas OSB, SE. (3) 
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As 8, we can take any number which satisfies the inequalities (3) and the condition 8,>a, for 
any a= (0, 1]. 


Finally, we evaluate 7,4,=2,—B,S,, put &,4:=€, and pass to step A. 


Step C. If O<o,<e,, we put %.41:=Zp, €x41=n€s, Where O< y, <y< 1, and we pass to 
step A. If o, = 0, we evaluate o(x;,, 0), by solving problem (1) for J = [(x,, 0). If 6(2., 0) =0, the 
process is terminated. Otherwise, we put z,,,=27,, €1+1=Yn€x, Where 0<y,<y<1, and pass to 
step A. 


The convergence of our method will be proved under the following assumptions. 

Condition 1. The functions @(x) and f;(z), i=41,2,..., m, belong to the class* C!-1(X). 
Condition 2. A number M > 0 exists, such that ||/,’ (x) || <M for all e=X, i=1,2,..., m. 
Condition 3. The set X*= {x*=X | 6(x*, 0) =O}is not empty. 


Condition 4. inf p(x) >—~. 


xeX 
Condition 5. The sequence {z;} is compact. 


Let us explain condition 3. If the function y(x) and the set X are convex and satisfy Slater’s 
condition, then the condition 6(z*, 0) =O is necessary and sufficient for y(x) to have a global 
minimum at the point x*. If only the set X is convex, then X* is the set of points at which the 
necessary conditions for a local minimum are satisfied. In the general case, to these points may 
also be added a series of others, e.g., the points at which no feasible directions exist. In the present 
paper the convergence of the sequence {z,} to the set of stationary points X* is investigated, 
so that condition 3 is natural. 


Notice that conditions (2) and (3) ensure that the sequence {@(zx.)} is not monotonically 


increasing. 


The convergence will be proved under the assumption that the sequence {z,} is finite, since 
otherwise, in accordance with step C, 6(2:,0)=O,ie. 7,]X”*. 


Lemma | 
For all o and s, satisfying the conditions 
<f/ (x), 9>+o0<0, iel(z, e), zeX, <s,s><1, 


we have 


C=Ce, (5) 


where C=min {1/M, 1/Z}. 
If p(x) =C*:'(X) then a number L > 0 exists such that, given any interval [x, y], belonging entirely to the set 
X, we have || 1p’ (x) —1p’ (y) ll <Z||z—y]]. Since the number of functions y(x),/; (x) is finite, a Lipschitz constant 


common to all the functions will exist. 
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Proof. Obviously, it is sufficient to take the case § < + °. Since the point y = x — ¢s is on the 
boundary (by definition of §), a number i will exist such that f(y) = 0. If f(x) > € at the point x, 
then e<f;(x)=|f:(z) —f:(y) | <M||z—y||=ME, whence 


C>e/M. (6) 
Assume that 0</,;(x)<e, ie. ie] (zx, €) at the point x. Put (8) =/,(2—f{s) and notice that 
~i(B)>0 for B=[0,€] and p,() =O. It can easily be seen that | dyp:(6)/dB] |s--<0,ie., 


<fi’ (y), S>=0. From the condition j@/ (x, ¢) and the condition that o and s satisfy system (4), 
we have ¢f;’ (x), s><—o, so that 


eXox—(f/ (rz), 9 <<fi/ (y), —<fi/ (x), > 
SIlf' (y) —f’ (x) Ills SZlly—zll =L6, 


whence 
C2Se/L. 
From (6) and (7) we obtain [=e min {1/M, 1/L}, i.e., the inequality (5). 


Lemma 2 


If the point x; 4, is constructed in accordance with the scheme of the method, and o,>e€,2>0, 


we have 


P (La) —P (Tags) = (8) 


for fixed a=(0, 1] and A=(0, 1]. 


The proof follows from the inequality* 


1 (o} 
Q (2) —Q (Tayi) => = AAG, min{ Ch e } 


— 


and inequality (5). 
Lemma 3 


For any point =X numbers -=2(x)>0 and 6=6(s) >0. exist such that, for all 
and r=U,(%) ={xeX : ||z—X || <6} we have /(x, €) </(x, 0). 


Proof. Put J={i=1, 2,..., m} and let us choose 


eE= 


2 ieJ\ 1(,0) 





*See [4], p.239. 
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Notice that € > 0, since, fori] J\ I (%, 0)we have f,(%) >0. 


Since the functions f(x) are continuous and the number of numbers i, i < m, is finite, a 
number 5(€) > 0 will exist such that, for all z=U,(X) and all i=J we have | fi(%)—f;(z) |<z. 
Since f;(x) 22, ieJ\(Z, 0), then f(x) >e for all r=U5 (ZX) and all ie J \.1(%, 0). Hence, 
for e=[0, ¢] for any ieJ\ 1 (%, 0) we have ieJ \ I (xz, &),and hence we get the inclusion 


I(x, ¢) cI (%. 0). 
Let 6=6(%,0) and 5=S(zZ,0) be the solutions of the problem (1) for x = x and 


I= I(x, 0). 
Lemma 4 


If I(x, ¢) I(x, 0) and G>0, then a number 6=6 (G) >0, exists, such that, for all 
xEU (XZ) we have 6(z, ¢) >G/2. 


Proof. Since (x) andf,’(x), i=1, 2,..., m,are continuous, a number 6(¢) >0, 
will exist, such that, for all z=U,(z) we have ||’ (x) —@’ (Z) ||<G/2 and || f/ (x) —fi () |! 
<6/2, ieI(%,0). Given any r= U,(x) and any ie/ (z, €) CI (x, 0) 

O><f! (Z), H+5=(f! (x), D+T+<f! (F)—fl (x), F 


> <fi (xz), S>+0—IIfi’ (Z) fi (x) MMSIZS<f’ (x), 3 +60/2, 
and similarly, 
0>—<q’ (x), 5)+6/2. 


Henceo=6/2and s=s will satisfy the conditions of problem (1) with J = (x, €) for anyt=U, (Zz) 
But G(x, ¢) and§(z,£)represent the solution of the problem of maximizing o under the same 
conditions, so that 6(2, e) >a/2. 


Theorem 


If conditions 1—5 are satisfied, then 


lim p (x,, X*) = lim inf llz,—2*||=0. 


k-> 00 hk-oo x eX 


Proof. Let K be the collection of all indices of the sequence {z,} : K={k=0, 1,...}. Assume 
that ¢,>e>0 for some e>0 and for all k>k,. Then, in accordance with the scheme of our 
method, anumber k,&K, will exist, such that 6,>e,2e>0 for all k >kg. From (5) we have 
Ca2Ce,=Ce, and then, in view of (8), we obtain @(xx)—@ (2x41) SCadre’/2 for allk >ko. 

But the sequence {@ (xx) } is convergent (since it is monotonic and bounded), which contradicts 
the previous inequality. In short, ¢,->0, koe. In accordance with the scheme of the method, 
a collection of indices K, C K exists, such that o,>0, keK,, k> ©. 


We shall now show that all the limit points of the sequence {z,} belong to the set X*. 


We consider two cases. 


Case 1. Let x be the unique limit point, i.e., 


lim 7,=7 €X. 


kh-> 00 





70 N. A. Bogomolov and V. G. Karmanov 


Assume that x =X \X*. Then, 6(%, 0) =a>0, and hence, by Lemma 3, numbers ¢>0 and 

5 (z) >0, exist such that / (a, ¢) <I (z, 0) for all e=[0, ] andzeU 4 (Z) Since &,>0 and 
I,+>£, ko, anumber kp will exist such that T%|U5(Z) and ¢,<é for all k>kg, and hence 
I (2x,, &x) <I (, 0) But then, by Lemma 4, the inequality 6(z,, €x) =d/2, becomes valid for 

k > kg, which contradicts the fact that o,>0, kK=K,, ko, 


Case 2. Assume that there is a limit point #=X of the sequence {x,}, which differs from 
: AZ. As before, we shall assume that T=]=X \ X*.Then 6 > 0 exists such that F=X\ Uz (Z). 
Since x and x are two distinct limit points of the sequence, there will exist, for any integer N, a 
number k >N and anumber m>1,such that x,x©U5/2(%), Ta+iGU0(X), i=O, 1,..., m—1, 
LrimEX \U,(Z). 

We now use the inequality (9): 


kht+m—-1 


@ (2p) =p (Tr4m) = ¥ @ (z:) a (i541) 


and since 1;©U,(%), i=k, k+4,..., k+m—1, we have o,20/2, i=k, k+1,.... 
In addition, ¢; > 8; for all numbers i, so that 


rl h+m—1 ; 5 
DP (Lx) —Q (La4m) => : min{ §.,--} ‘ 


Notice that 
kh+m—1 k+m-i 


Ba ie min{ §.,5—} > min { y pus}. 


ixh 


For, if all the 8.<G/2L, we have 


and hence 


4 
p(x,) —@ (La4m) => a ANG min{ = ; =} = const > 0. 


This last inequality contradicts the convergence of the sequence {@ (zz) }. 


In short, any limit point of the sequence {x,} belongs to the set X*. Let us show that 
lim 9 (z,, X°) =0. 
k-+ co 
If this is not the case, then a number A > 0 and a subsequence {z.}, k=K.<K, will exist, such 
that o(2,, X°)>A_ for all k@K,. But in view of the compactness and what has been said above, 
there will be a collection of numbers K3 C Ky for which we have 





General problem of non-linear programming 


lm 2=xeX’, 


k-oo, REK3 


which contradicts our assumption that p(x, X°)>A for all kEK2. 


Notice that the convergence theorem is proved for any initial approximation z,—X. In 
actual problems we usually have information about a point xg which is reasonably close to the required 


point 7°=X", 


Let the point x9 belong to some connectivity component set Xp of the set {r=X|@(x)<@ 
(xo) }. If the set Xp is bounded, and for any x*=X*NX, we have 
p(z*)= min g(z), 


xe Xo 


lim p(z,)=@(z"*), lim 0 (a,, X‘NX,) =0. 
kh oo hk oo 
For, since @ (2x41) <M(Ze) for all k, then 7,=Xo,k=0, 1,.... By the convergence 
theorem, a collection of indices K,; C K exists, such that z,>%, k@K,, ko, and 6(%, 0) =0, 
so that x=X‘*NX,. In view of the convergence of the sequence{¢ (x) } we get 
lim @(2,) =@(z). 
h-> oo 
In conclusion it may be mentioned that our entire discussion also holds for the case when 
the condition <s, s >< 1 in problem (1) is replaced by the condition 
max |s;|<1; 


juni 2,..., n 


here, s; denotes the j-th component of the vector s=(s,,...,S,). In this case, problem (1) 
becomes a problem of linear programming. 


Translated by D. E. Brown 
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REGULARITY CONDITIONS AND NECESSARY CONDITIONS FOR A 
MAXIMIN WITH CONNECTED VARIABLES* 


V. V. FEDOROV 
Moscow 
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REGULARITY conditions are introduced for non-convex problems. In conjunction with the 
method of penalties, they enable the necessary conditions for an optimum to be obtained in 
minimax problems. 


Introduction 


The directional differentiability of a minimum function was proved in [1, 2] , thereby 
enabling the necessary conditions for optimality to be derived in maximin problems with connected 
variables. Since, however, the assumptions ensuring the existence of directional derivatives in the 
case of connected variables are quite rigid, it seems useful to obtain the necessary conditions for 
optimality under wider assumptions. This can be done using the method of penalty functions, which 
has only quite recently come to be systematically used to derive optimality conditions (see e.g., 
[3,4] ). The method of penalties proves to be especially effective in this sense in complex maximin 
problems, thanks to the wide range of convergence theorems that is now available for it[4]. 


In general terms, the method of obtaining the necessary conditions is as follows. The initial 
problem is first reduced to a parametric family of simpler problems which have previously been 
investigated. We then pass to the limit with respect to the penalty parameter in the optimality 
conditions for the penalty problems, and thereby obtain the optimality conditions in the initial 
problem. 


An approach of this kind offers a basis for considering the method of penalty functions as 
an “algorithm”? for stating the optimality conditions in extremal problems. As distinct from the 
existing general schemes for analyzing extremal problems [5, 6] , the method of penalties in 
minimax problems does not lead to necessary conditions for optimality of a general type (of the 
Euler—Lagrange equation type). Each new more complicated problem has to be analyzed on the 
basis of results previously obtained, while utilizing the “algorithm” stated above. It should be 
mentioned that, as a rule, each such “step” does not involve too serious difficulties, provided that 
we arrange for the problems to become only gradually more complicated. 


In our view, there is no justification for attempting to obtain optimality conditions of the 
most general possible kind in minimax problems. First, such conditions (even if they could be 
obtained) would be extremely complicated and unwieldly. Second, the attempt would require as 
a preliminary a refined idea of what is meant by a general minimax problem. The latter is 
unattainable in principle. The point is that any minimax problem can be regarded as the consequence 
of applying the principle of the best guaranteed result (or other optimality principle) in some game 


*Zh. vychisl. Mat. mat. Fiz., 17, 1, 79-90. 1977. 
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Necessary conditions for a maximin 


in certain strategy sets. Since new systems and methods of operation (strategies) are constantly 
arising, new minimax problems will make their appearance. It follows from what has been said 
that the above-mentioned feature of the general scheme of obtaining necessary conditions on the 
basis of the method of penalties can be looked on as a merit rather than a drawback. Of course, 
this does not mean that concrete classes of minimax problems cannot be more carefully studied 
with the aid of traditional schemes. 


In the present paper we show that, if discontinuous penalties are used [4] , the well-known 
conditions for the exact solution of a problem with constraints by the penalty function method are 
also conditions for regularity of the problem, and lead to stronger conditions for optimality. This 
is also true for the maximin with connected variables. There is no difficulty in extending all our 
conclusions to the problem of seeking a multiple maximin with constraints, so that a wide range 
of problems in operations research and the theory of games is covered. 


1. Regularity conditions 
We consider the problem of finding 
max F(xr)=F (z’), 
A={reX|qQi(x)=0, 1<i<m}. (2) 


Definition 1. We shall say that the functional constraints, @;(z)>0, 1<i<m, specifying 
the set A, are regular, if numbers K and 5 > 0 exist, such that, for all z=(V3(A)\A)NX we have 


min g;(z)<—K p(z, A). (3) 
i<i<m 
Here and below, p is the metric in X, and Vs(A) is the 5-neighbourhood of the set A. 
Whether the regularity conditions (3) are satisfied will naturally depend on how the set A is 


described, i.e., on the form of the functions yx). However, any set A can formally be specified 
in the form (2) by regular constraints. For this, we have to put m = 1 and g:(x) =—p(z, A). 


For conditions (3) to be satisfied, it is sufficient that at least one of the following conditions 
be satisfied: 


1) y,(x) are linear functions, and X is a convex set of Euclidean space E,, [4] ; 


2) v(x) are concave and satisfy Slater’s condition in a bounded convex set X, i.e., a point 
%EX, exists, for which p(X) > 0, 1<i<m [‘]; 


3) X is a finite set. 


In short, the inequality (3) expresses a characteristic property inherent in the well-known 
regularity conditions [7] in convex programming (at least for bounded X). At the same time, 
condition (3) holds for a class of functions wider than the class of concave functions. Let us 


mention an example: 
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A={xeX| max g; (x) =0}, 
i<i<m 
where y,(x) are concave functions on the bounded convex set X, and points 7;=X exist, such that 
p: (¥:) >0, 1<i<m. 


A similar example may be constructed with linear yx). 


Condition (3) implies geometrically that, in a 5-neighbourhood of the set A, the function 


(x)= min @;(z) 


i<i<m 


decreases at least as fast as a linear function of the distance to the set A. While it may not be easy 
to check condition (3) directly, the stock of such functions is clearly considerable. 


The following theorem, proved in [4], will be required below: 


Theorem 1 


Let the functions yx) be continuous, let F(x) satisfy a Lipschitz condition in the compact 
set X, and let the regularity conditions (3) hold. Then, for all sufficiently large C, 


gins q>1, 


O< max L,(z,C)— max F(z)< 
4 (z) 0, ney 


xeX xEA 
a, aed, 


p(x (C),A)<{ 
Q, qwi, 


where 


L,(z,C) = F(x)— cy [min (0, p:(x)) |’, 


t=={ 


L,(x7°(C),C)= max L,(z,C), zr, (C)e= X, 


xeX 
and B is a constant independent of C. 
Theorem | establishes an error estimate in the case when problem (1), (2) is solved by the 


method of penalties. In particular, the penalty function L, (x, C) gives the exact solution of the 
problem for a certain finite C We know [4, 8] that, for a concave problem (1), (2), this is equivalent 


to the Lagrange function having a saddle point. 


We shall now obtain the necessary conditions, satisfied by any solution of problem (1), (2). 


Theorem 2 


In (1), (2), let the functions F(x) and yx) be continuously differentiable in the convex closed 
set YC E,. Then, numbers Ao, Ay,-.-, Am=0 which do not vanish simultaneously, exist, such 


that 


{ak (g°) + > Api (x") } = K,*(z°), 


Kip: (x’) =0, 1<ixm. 
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If, in addition, the regularity conditions (3) hold, then Ay > 0 (it can be assumed without 
loss of generality that Xp = 1). 


Here, K* ‘yx (x°) is the cone conjugate to the cone of feasible directions of the set X at the 
point Xo. 


Proof. 1.We put F, (x) =F (x) —||z—zx°||?. Then F(x) has a unique realization x° of its 
maximum in X. All our remaining arguments will be carried out in the compact set SM X, where 
Sis the closed sphere, center x9. Since all the conditions for convergence of the method of 
penalties hold in SM X, we have 


max F(z)=lim max { L,(z,C) = F,(z) ~}" C[min (0, 9; (2) ) I 


xEA C>o xeES(\X 


and the realizations x9(x) of the maximum L.(z, C)in SMX are convergent to x9 as C> ©, For 
sufficiently large C, we write down the condition for an extremum of L.(z,C) in SOX: 


- {F,’(2*(C))— acy min (0, p(x" (C) ) gi’ (2° (C)) be Ks'(2°(C)), 


We now normalize (5) to 


m(C)=1+S" w(C), where 1;(C)=—2C min (0, 9,(z°(C))), 


i=xi 


i.e., we introduce A, (C) =1/m (C), Ai(C) =p:(C)/m(C) and pass to the limit in (5) as C > ©. 
Then, a sequence {C,}-—-co can be chosen, such that Ao(Cx) Ao, Ai(Cx) >Ax (since 
s A:(C)=1, 4:(C)=S0) and at the same time, x°(C,) > 2". 

The mapping x-K,x*(zx) isclosed, and F,’(x), q,’(x) are continuous; hence (4) follows 
from (5). Further, if :(x°)>0, then, for sufficiently large C,, the coefficient ;(C,,) =0, ic., 
the conditions of supplementary non-rigidity A:@;(x°)=O are satisfied. 


m 


2. Now let the regularity conditions (3) hold. We shall show that the sum Sju;(C) is then 


tat] 
uniformly bounded with respect to C. This is all that is needed to complete the proof of the theorem, 
since, on passing to the limit in relation (5) as C, > ©, the coefficient of F'(x®) will be equal to 1. 


We have 


»’ 1; (C) < m |2€ min min (0, g;(2"(C))) |<m 2CNp(2°(C), A), 


i<i<m 
ix1 


where JN is the Lipschitz constant of the functions yx) in the sphere S. By Theorem 1, 


o(x°(C),A)<O(1/C) ;hence > u;(C) is bounded. The theorem is proved. 
i=i 
Notes. 1. The scheme of proof of our theorem realizes the “algorithm” mentioned in the 
Introduction, for obtaining necessary conditions for optimality in problem (1), (2). In effect, the 
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problem has been reduced by the method of penalties to unconstrained optimization of the function 
L(x, C), thereby enabling the necessary conditions for an unconstrained extremum to be applied. 
For problems more complicated than (1), (2), different theorems on the convergence of the method 
of penalty functions have to be employed, along with the relevant necessary conditions (see e.g., 


Sections 2 and 3, and also [3, 4]). 


2. The theorem has the greatest interest in the case when XQ = 1, since in other cases the 
necessary conditions (4) are in no way connected with the function to be optimized F(x). 


3. As we indicated above, the regularity conditions are not merely satisfied in convex 
problems. Hence Theorem 2 establishes the existence of Lagrange factors Ao=1, A1,..., Am 


for a wider class of problems. 


20 


4. When the regularity conditions (3) hold, the theorem can be proved in a different way. In 
fact, by Theorem 1, problem (1), (2) reduces for finite Cp to seeking the maximin 


™m 


max F(z) = max { F(z) +Co > min (0, ¢i (x)) } 


xeEA xe S[]? 
i=1 


™m 
= max min { Fe) + ) Ai (z) } ; 
xe Sx 0SA;<Co, 1<i<m 


i=1 


It remains only to apply the necessary conditions for a maximin [2-4]. 


2. Maximin with connected variables 


Let us now turn to the problem of seeking 


sup min F(z, y) 


xEX yEeB(x) 
and the point x°=X (if it exists), realizing 


min F(z°,y)=sup minF (z, y). 


ye B(x") xeEX yeEB(x) 
The many-valued mapping B(x) can be assumed to be specified in the form 


B(x) ={y=Y|g;(z, y) =0, 1<j<m} 


and to be non-empty for all =X. 


For the existence of an optimal strategy x®°, it is sufficient that the mapping B(x) be 
continuous in the Hausdorff metric, and F(x, y) continuous in the compact sets X, Y. We know 
[4] that the sufficient condition for B(x) to be Hausdorff-continuous is 


‘B’(t)=B(z) += WreX, 


where B°(x) ={y=Y|g;(z, y)>0, 1<j<m}, and B denotes the closure of the set B. 


We shall use below a different sufficient condition for B(x) to be continuous, following 


from the regularity conditions. 
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Definition 2. The mapping B(x) is regular at the point z°=X, if numbers K, 5 > 0 exist, 
such that, for all r=V,(x°) and all ye (V,(B(z))\B(xz))NY we have 


min g;(z, y)< —K p(y, B(z)). 


i<j<xm 
The mapping B(x) is regular in the set X if it is regular at any point x=X with fixed 
parameters K, 5 >0. 


If B(x) is a constant mapping, the definition 2 is obviously the same as the definition 1. 


It is easy to state sufficient conditions for the mapping B(x) to be regular, similar to the 
conditions quoted in Section 1; e.g., 


1) g;(z, y) =Djyt+f;(x), where D, is a matrix; in the case, B(x) is regular in any convex set 
X; 

2) g(x, y) are continuous in X X Y, and concave with respect to y, while Y is a convex 
compactum, and y exists, for which 


min g;(z°, 7) >0; 


i<sjixm 


then, B(x) is regular at the point x9. 


However, in the same way as in Section 1, examples can be quoted of non-concave g(x, y), 
specifying regular mappings B(x). 


Lemma 1 


If the many-valued mapping B(x), specified by the continuous functions g(x, y) and the 
compactum Y, is regular at the point x9, then it is Hausdorff — continuous at x0, 


Proof. The upper semi-continuity of B(x) is obvious. We shall prove the lower semi-continuity 
at the point x9. Assume that there exist x,>2°, y"=B(z°), y"#V,(B(z,)) for 5>0. Then, 


by (7), 
min g;(2, y*) <—K6<0 VE. 


i<j<m 


On the other hand, we can choose a subsequence {k;} such that y*—>y°=B(z°) .Here, 


lim min g;(z,, y") = min g;(z’, y’) 20, 


l+oo i<j=m i<j<m 
and we arrive at a contradiction. 


When B(x) is regular in X, Lemma 1 guarantees the existence of an optimal strategy x°. 


Theorem 3 


Let 
0 


Ox 


0 
8;(2,y), 5, bile) 
7] 


0 0 
ako a ae of ’ ’ 
pn FMW)» ay (z, y) 
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be continuous in the product of convex closed sets X, Y, where X C E,,, and Y is a bounded set 
of Euclidean space. Then there exist in problem (6) numbers p; > 0 and numbers Ao, Aw0, 1S 


i<rxn+1, 1<jSm, not all vanishing, and also points y; such that »¥ pix=1, 


i=1 


Tr Fi) m r) " 

same . ar ? 7? eames Xn j oh e K ‘ x 4 
) P| he we (x*, yi) ) a (2°, y:) x’ (z°) 
taxi j=1 


a) . a 
{i mot FA" 9) } a Bee Ga gi(2°, ys) be Ke"(ys) 
Oy Oy 
j=1 


Aisgs (2°, yi) =0, 
yiSR (2°) = {y=B(z°) |F (2°, y) = min F(z’, z)}. 


zeB(x°) 


If the mapping B(x) is regular at the point x°, then Ag = 1. 


Proof. We shall use the scheme of arguments of Theorem 2. We introduce F;(z, y) =F (a, y) 
—||z—z°||’, having the unique realization x° of the maximin 


sup min F,(z,y). 


xeS)X yeB(x) 


By the theorem on penalty functions [4], 


sup min F,(z,y)=lim max minZ,(z, y,C), 
zeEX)S yeB(x) Co xEe8(I\X yeY 


L,(z, y,C)=F,(2,y) + CY" [min(0; g)(2, y)) T° 


j=1 
As C, + ©, the realization of the maximin x°(C;,) of the function L» tends to x9. We write the 
necessary conditions for the point x9(C;,) [2] : there exist 


pi(Cx) >0 and yi(Cy) =R (2° (Cr) ) = {y= ¥1La(2" (Cr), y, Ca) 
= min L,(x°(C,), z, C,)} 


zeY 


such that 


Si r(Gy= 4, 


: a 
—S ples) {—Fi@(Cy) yl) 


= a 
—Y 8u(C)—sil2"(Cr), ys(Cr)) he Kx'(2°(C,)). 


j=1 





Necessary conditions for a maximin 


Since y(C,) realizes the minimum of L.(z°(C,), y, Cx), we have 


a Pie y a me 
{gph co silCa) > Bile (Cx), yi(Cx)) 


& Ky*(yi(C,)). 
Here, s;:(Cx) =—2C, min (0; gj(z°(C;), yi(Cx)))- 


We normalize (1) and (11) to 
m(C,) =! +S su(C,)> t, 


by introducing 


| 33(C 
hy (C,) = ——-, ri (C,) = Sul ) : 
m(C;,) m(C,) 


Obviously, 2,(C,), A:;(C,) do not all vanish, since 
do(Cr) +S) (Cr) = L. 


We now pass to the limit in (10), (11) with respect to the subsequence {C),.} in such a way 
that yj (Cx,) > yi E R(2°), Ao (Cic,) > dor Aig (Cie) > Kis: 


We then obtain from (10), (11) the conditions (8), (9) and the conditions of supplementary 
non-rigidity . 


If the mapping satisfies the regularity conditions at the point x°, we can easily show, in the 
same way as in Theorem 2, that the sums 
7 . 
x $;(C) 


j==1 
are uniformly bounded with respect to C. The conditions (8) and (9) here follow from (10) and 
(11), on passing to the limit as C;,, - °° without preliminary normalization. 


It can easily be seen that the number of relations in the necessary conditions of Theorem 3 
is equal to the number of unknown parameters, so that in principle the conditions contain 
sufficient information for finding x°. 


In the case when Y is the entire space and the regular mapping B(x) is bounded in the 
neighbourhood of x9, Theorem 3 may be stated in a different way. We introduce the Lagrange 


function 


L(2,y,2)=F(a,y)—)) res(e,y), 520, 


j=1 
and the set 


OL (x, y,h) 


A(a’,y)={i20 | 
oy 


=i hg (2, y)= 0}. 
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Since the mapping B(x) is regular at the point x9, and in view of Theorem 2, A(x®, y) will be 
non-empty for y=R(z°). 


Let A denote the set of functions d(-), specified in R(x) and such that A(y) =A(z’, y) 
for all y=R(z"). 


Corollary 
The assertion of Theorem 3 is equivalent to the existence of A(*)=A such that 


—M(A(-)) NKx* (x?) =2. 


OL (x’,y,A(y)) 
Ox 





 yeR(z")}, 


M(A(-))=00{2e8, | z= 


where co A is the closure of the convex hull of the set A. From condition (12) we obtain the following 
necessary condition for the point x: 





inf sup min 
A()EA figi=1 yeR(x ) 


a 


Ox 


g)<0 


(see Theorems 2 and 3 of [2]). 


To check on condition (13), we need to solve a “non-classical” problem of seeking a 
min-max-min. In fact, the infimum in (13) is taken with respect to the set of functions A(-), but 
nevertheless it is not possible to write (13) in the form 
( OL (x, y,h) 


OX 


8) <0, 


sup min min 
ell=4 yER(x*) Ac A(x”, y) 


SEK y (x) 


i.e., to go over to a minimax problem in Euclidean space. This suggests the idea that the necessary 
conditions are more convenient to use in the form (12), rather than in the form (13). The same 
conclusion holds for the problem of seeking a maximin with splitting variables [2, 3]. 


Notice that problems of the type (13) have recently appeared with increasing frequency 
in connection with the analysis of hierarchical control system. 


It was shown in [2] (Theorem 4.2) that, under more rigid assumptions, ensuring the directional 
differentiability of the minimum function 


f(x) = min F(z, y) 


yeB(x) 


conditions (12) hold for any function 4(-) = A. In this case, the analogue of condition (13) is 


OL (x’,y,A(y)) 
Ox 





sup sup min ( 
A()EA  figii=1 yeR(x°) 
&eK y (x*) 
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which, in view of the fact that sup and min permute, is equivalent to 
AC)EA YER(X9) 


° OL (2’, Y; i) 
sup min max |———,g]}S0, 


llg=41 yeR(x’) Ae A(x’,y) Ox 


1EeK 0 
geKy te") 


i.e., an ordinary maximin problem. 


3. Sequential maximin with constraints 


Let x, 0 realize 


max min 
x,EGAt Y1SBi( x) 


max min Fa Wig ks Gates 


¥SA, (M1, YtreY, 4) USB, (S41, 


or more briefly, 


M=({max min],., max F(2,,yi,..., Ya; no1) 


x eA; y ;eB; ei 4p SAnay 


sel sain max: “|. F (2,"*, #),. «9 SacBee hess) 


y eB X44 ,EAy 


Here, the mappings A; and B; are specified in the form 


Yi-1) = {2iEX;| hin (La, Yr, - «+» Yi-sy Ti) SO}, 


eey xj) = {yj;=Y;| Zim (x;, Yiyeeey Li, y;) 0}, 


1<i<n+1, 14<j<n,where / and m run over finite sets of indices. 


We shall make the following assumptions: 


1) the sets x,, Y; are convex and closed, and below to r;— and s;-dimensional Euclidean spaces 


respectively ; 
2) the mappings A; and B; are bounded and Hausdorff-continuous, 
3) the functions hj, gj, have continuous partial derivatives. 
We introduce the Lagrange function connected with the problem (14): 


L(x, Y1, eery Zn, Yn, Un+45 ho, As, ooeg ya U4, coe Un+1) 


=AF (x4, eeey Yny Luss) -» (Aj, 8i (x, Yityeeey y;)) 


j=1 


n+1 


+)" (ui, Ri (21, Ysy-- ss Z:i)), 


iz=1 
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where A;= {Aim}, 
Yiyeeey zi) ={hi(x1,. a 


Theorem 4 


ui={pn} are vectors, g;(2,... 
, «:)} are vector functions; the brackets denote the scalar product. 


’ Yi) = {Zim (21, Yrr- oy Ys}, Bi (as, 


Let the above assumptions - satisfied and let x,®<€ X, be the ve of problem (14), 


(2) (on) 


9 Piss... 


(15). Then, there exist numbers p;.”, 9 


ho > 0, not all zero, and also points 7", x$'?”, 


s+i, 1Si.<r,t+s,tr.+1, 


aay be se nay (2 n+1 


(41) 
O Fiatecaae Seer 


(ii) (i551 --. 
o6 © By Bn: 


in). + ths, pies) ; 


(1) (2) 


iy Pia: . 


is. (isj1) 


> Bi, Po 9° 


‘dt. Q i.) 
(at ) (Xs = tt coe Y 


(i45,.. 
(wy 


M=[ max min Ji-2 max F(2,°,y, 


x ;EA; y,e8; Tn41 Anas 


=[ min max ];~2.F (z,°,y 


fej er, +20:, 


a In) 
’ 


(2n) 


fis Pi,j.. Jn + 


“4 Pri 
AMS Lae 


GO, Hd, 


> 0. and vectors a i yee ih) 


ati dy) 1<i<crt+1, 1<j,;Sr,+ 


2 09 En+1 


: inchaok ts 1<t<n, 0<kS<n, 


(2 n) eae 
iii: vty Pan 


V by, Jiyos- 


"pads (4:91) 


hel 
sho, A adie n aa, ws ocne 


LigioeeJ inj, oe 
(%s n). dies . aa NS teed | 


n+1 


oh va) & Ky, (y"), 


(i) (idt- Gy), 
; » Un4+1 ‘n ; 


L (x;° Yi 


(ii “In)) = Kx. (2°), 


: 0 i 
ms Rass (xy ’ we — mt) ) =0, 


(41) 


Ce ee 


’ Tn4s) 





Necessary conditions for a maximin 


4 0 (it) (41h) 
me max OFT, yi, ty. «+ Ln4s) 


© FA ns 


== (x,°, ioe , EH ae ¢ 


If the mappings A;, B; are regular in the product of the relevant X; and Y;, then the 
coefficient Ag = 1 in the Lagrange form (16). 


Proof. By the convergence theorem for the penalty method in problem (14) [4] 


M= lim [ max min Jin: max (2, 9:,...,2%n; Yn: 2nanC); 


Co x, ex, y,eY, © 41 nat 


+c na [min (0: gim (ay, .-. 
ae 


a2 vs [min (0; hi: (x4, 


On writing here the necessary conditions for optimality for the multiple maximin [4] , then 
passing to the limit as © > °°, we obtain the theorem. 


It would seem that the most important regular case of problem (14) corresponds to linear 


connections, when 


i i-1 
A={ nex, y na,+ J" Diy<at, 
' / 
} i 


ee | 





B;= “ye; ¥ A, j,.+ ¥. Giy<b;}, 
‘ser 


k=1 


where F;;:, Di, Hii, G1; are matrices, and a;, b; are vectors of the respective spaces. 


Translated by D. E. Brown 
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A STOCHASTIC QUASI-GRADIENT METHOD FOR SEEKING A MAXIMIN* 
N. M. NOVIKOVA 
Moscow 


(Received 24 April 1975; revised 10 November 1975) 


THE ALGORITHM described below for seeking a maximin is a combination of the penalty method 
and the stochastic quasi-gradient method. A theorem on convergence to the set of solutions with 
probability unity is proved. The ALGOL program is given, along with test results. 


1. Consider the problem of finding 


u°= max min f(z,y) 
xeX yeY 


and the best guaranteeing strategy 7°=X: 


min f(z’, y) =u". (2) 
yeY 
We assume that the function f(x, y) is continuous with respect to x andy in XCE? and YCE” 
which are closed bounded sets of Euclidean space. In accordance with [1], the problem is 
equivalent to finding [u— %,®,(z, wu) ] with respect to x, u in the set'XX[M,, M.] 
as @, +00, where 


®,(z,u)=f |min[0; f(x,y)—u]|"o(dy), 1<¢<2, 
7 


M,=minf(z,y), M.=maxf(z,y). 


XXY XXY 


Putting F(x, uv) =u—a,,0,(z, w) ,we have 


‘= lim max F,°(z,u)= lim F,%(2,°, Un’); 


n—->c XX[M:,M2] n> oo 


the vector (z,,°, u,°) realizes 
max F,%(z,w), 

XX[{M1,M2] 

any limit point of the sequence {(z,.°, wn") } being a solution of problem (1), (2). In other words, 

{(z,°, w,°)} is convergent to X° X u9, where X9= {x°} is the set of solutions of Eq. (2). 


In [2], under the extra assumption that f(x, y) satisfies a Lipschitz condition with respect 
to y for all z=X, the following convergence rate estimate is obtained for the method of penalties 
(1'): 


O<u,°—w<D(1/a,,) /"+2-, D=const, 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 91-99, 1977. 
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for sufficiently large u. If we also recall (see [2]) that uww< F,°(z,°, u,°)<u,°, we get OXF, 
(z,", Un’) —u°<D(4/a,,) /("+9-") and since ®,(z°, u°) =0, we have 


OXF ,7(x,", Un’) —F,7(2°, uw) <D(4/a,) /"t9-® , (3) 
Gradient methods* are usually employed to find z,°, u,° 


To avoid the difficulties involved in evaluating the integral in grad ®, (x, u), we use the 


stochastic quasi-gradient method (see [3] ). Let J o(dy)=1 (appropriate normalization); for 


instance, let Y be the m-dimensional unit cube, and o the ordinary Lebesgue measure. The integral 
in question may then be interpreted as the mathematical expectation of the integrand of the 
random variable y, subject to a uniform distribution law in Y. On randomly choosing y,=Y (with 
equal probability), we can construct the random sequence {(z,, w,)}, convergent almost surely 
(with probability unity) to X° X w®, ie., the set of solutions of the problem (1), (2). 


We introduce y,‘(y|z, wu) =u—a,|min[0; f(z, y)—u]|* isa function of the random 
variable y, dependent on x, u, whose mathematical expectation 


M,{xn7(y|z, u)}=Fn%(z, u). (4) 


In future we shall seek the indices of the mathematical expectation. We assume that X is convex, 
and f(x, y) is concave with respect to x, ie., Xn°(y|z,u) is concave with respect to (x, u). We 
denote the vector (x, u) by z; the set Z=X X[M,, M2] is convex. We define €,7(y|z,u) as 
the generalized gradient with respect to x, u for fixed y, of the function ¥n7(y|z, w): 


CEnt(y|2'), 2°-2><yn2(y| 2") —xn2(y |Z") Vz', 27eZ (5) 


(here, <<, >is the scalar product). For instance, in the case of a discontinuous penalty, on taking 

q = 1, we can choose for p = 1 the components of the vector €,'(y|z, uw) =(En(y|z, u); qaly|z, 

u)) as follows: 

Of (x,y) 
Ox 

nn(y|z, wu) =1+a, sign[max (0; f(z, y)—u) J, 


En (ylz,u)=—a, sign[max (0; f(z, y)—u) J, 


since the concave function is differentiable with respect to any direction, the right— or left-hand 
partial derivative (say the left-hand, 0f* (xz, y)/dx or Of-(x, y)/0x) will exist. Since the sets 
Y, Z are bounded, the following estimate for the Euclidean norm of the generalized gradient is 
obvious: ||,.7(y|z) |l’<Ca,°, C=const (since the finite generalized gradient of f(x, y) with 


respect to x exists). 


Under our assumptions, condition (4) allows us to apply the result of [3] to problem (1’). 
Using this result, we can construct random sequences {z,,"} k=,’ convergent almost surely to the 


sets {Zn°} respectively. Then, 
w= lim lim yn?(yn*|Zn") 


n->co k->co 





*The method of gradient projections (generalized gradient method). 
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almost surely; {y,"}?°_, are sequences of independent, uniformly distributed random variables 
for any n= 1, 2,... However, the use of the repeated limit is inconvenient when it comes to 
practical computation. It can be avoided by the method described below, which takes account 
of the specific nature of the problem. 


Assume that the numerical sequences {a,}, {an} are such that the following conditions 


are Satisfied: 


oo 


Masato, Ant +o, 


n=1 


see 1 1/(m+q—1) 

Me ey alias 8 
An 

n=1 


We specify arbitrary z,=Z and define {z,},~° as follows 


Zari=M{2ntdnbn"(Yalzn)}, Yn Y (8) 


m{ } is the projection onto Z, and {y,},” is a sequence of independent, uniformly distributed 


random variables. 


Theorem 


Under our assumptions, the probability of {z,} being convergent to Z°=X°Xu’, is unity, 
i.e., the distance p(z,, Z°) between z, and Z° tends to zero almost surely. 


Proof. Using (8) and (5), we have 


2n4i—2° ||? S][1Z, 40,622 (Yn | Zn) —2°||/’=||z,—2° ||? 
+24,66n7(Yn| Zn), Zr—2> tae" En? (Yn| Zn) Il? 
S|[2n—2° Il? +20, [ Yn? (Yn| Zn) —Xn2(Yn|2°) |+Ca,7a,”. 


Consequently, since s°=Z" is arbitrary, we obtain 
min ||224:—2°|l’<min||z,—2°||’+2an[y%n7(Yn| Zn) —w°]+Ca,’a,, 
2%EZ° 20 Zo 
since Xn"(Yn| n°) =u" Vy,eY: 


M{ min ||2,4,—2°|l"12:,..., 22} < min |]z,—z2°||?+Ca,?a,” 
2°eZ° 2°eZ°® 
+2a,,[F,.9(Z,) —u°)S min ||z,,—2" ||’ + 2a, [F,7(z,°) —u°]+Ca,’a,”, 
z°eEZ” 
since 


F,9(2n) SF n7(Zn°) = max F,,7(z); 
Z 
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the previous inequalities (4) and the properties of the conditional mathematical expectation have 
also been used. 


Hence from (3) we have 
M {p?(Zn41, Z°) | 21, --+, Zn} SP? (Zn, Z°) +Canan’ 
+Zima (tia, 


i.e., the sequence {p*(z,, Z°)},” is convergent with probability unity. For, since (see [3] ) the 
sequence 


{Wa} nis Wnr=—P’ (Zn, Z°) — bz [2Da, (1/cn) /°"*9-P +Ca,2ar,? | 


k=n+1i 


forms a semi-martingale (as is clear from (7)), the required convergence will follow from the 
properties of semi-martingales (see [4] , Chapter 7, Section 4, Theorem 4.1 (1)). 


After this, on taking the unconditional mathematical expectation of both sides of the 
inequalities and performing the summation, we obviously get 


nm 
V2°eZ? O<M {Il2n41—2"I"}<llz—2"I°+C Yate’ 
k=1i 


+2 9(z,) —F,7(2°)}, owl 2... 
Yi aM (F, (z,) —F,2(2°)} n=1 


k=1i 


This follows from the properties of the mathematical expectation (the unconditional expectation 
of the conditional expectation is equal to the unconditional expectation). Hence 


Sam {F,9(2,) —F,2(2°) } >—, 
Rk=1 


| Pe <0o, 


k=1 


By definition, F,,2(z°)=u° for z°=Z°; hence 


aM {F.7(Z,) —u°} >—~, 
n=1 


Consequently, 


anMI|F 2 (Zn) —u°|<o, 


n=1 


since, in addition to the previous inequality, 


co 


Ya {max[0; F,.2(z,) —u°]} <<, 


inasmuch as 


M {max [0; F,,*(z,) —u°]}<D(1/a,,) 1/(m+q-4) | 
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which in turn follows from (3) (the measure of Y is equal to 1). Then, from conditions (6) on the 
sequence {a,,} we have 


lim M{|F,,°(z,) —u°|} =0. 


nm-> oo 


Hence a subsequence {Fy, (Zn, )},_1, exists, convergent to u° almost surely. 


We are now able to show that {z,} is convergent almost surely to Z°,i.e., 0(z,, Z°) ~0 
almost surely as n > °°. For, denote by A the set 


{a= {yn} nmil Fa, (Zn, (w))—>u", 0°(z,(@), Z°) converges }. 


k> oo 


By what has been proved above, the probability measure of this set P{A}=1. 


Since Z is bounded, for all @=A the sequence {Zn,(@)} has a convergent subsequence 
Co COE henceforth denoted by {z,,(@)}: 


Zn (@) — 2(o), Fy (za, (w)) > u°(@) =u". 


+> oo P+ 


In other words, 
3 0 
Un (@)—Gn Do (Zn, (o)) > uv’. 


p> co 


Hence on,@,(2,.(@)) is bounded (since un, and Z are bounded), and 


M,(2n,(@)) > 0 VoeA, since Gn,>+~., 


pao 


It then follows from the continuity of ®,(z) that ®,(2(@))=0 forall w=A. Hence, 


u(w)<minf(z(o),y). 
In addition, w(@)>u" (since w,,(@)>&(@), UWnp(@) —GnpDg(Zn,(@)) > U°, OnpDg(Znz 
(w) ) =0).Hence 


min f(z(), y) >u(@) >u°= max min f(z, y) 
x xX 


and since 2(@)=Z (where Z is closed), we have #(w) =2x°(@), i(m) =u", ie., 2(@)=Z° 
for all w=A. Hence p(z,,,(@), 2.) +0 as p>oo and p*(z,(@), Z°) are convergent; we 


now finally have 
O(Zn(@),Z°) ~0 VoeA. 


n->oco 


Since P{A}=1, the proof of the theorem is complete. 


The theorem can obviously be restated as follows: {u,,} is convergent to v9 almost surely, 
and any limit point of the sequence {z,} is some x°=X° with probability unity. 


2. Let us now make some general remarks concerning the proposed method. Its merits are 
simplicity of realization, and reduced demand on the smoothness and stability of the computational 
errors. For, in order to satisfy the conditions of the theorem, it is sufficient that 

M {627(YnlZn) [Yn; Zn} = grads Xn? (YnlZn) tn, ys a,lr,| <0 (9) 


n= 
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(here, grad, denotes the generalized gradient with respect to z). This enables us to use approximate 
methods to evaluate the generalized gradient, i.e., to find grad, f(z,, Yn) we can use talZny Ya): 


co 


M {fe (an; Yn) lan, Yn} = grad. f (2p, Yn) +n, YY) andtnlsal<ee. 


n=1 


The conditions obtained are similar to the requirements of [3]. 


With regard to the rate of convergence, the computation of test examples showed that a first 
approximation to v9, and quite a short distance from x®, can be reached fairly rapidly; but further 
improvement of v9 takes a long time, so that certain modifications of the algorithm can be 
recommended for accelerating the convergence in practice (see Appendix). Of course the computer 
time increases in proportion to the dimensionality of the problem. 


Of course the method can be used, not only when seeking a maximin, but also when 
maximinizing a concave function under general constraints: 


f(z°)=maxf(z), G={zeZ\g(z,y)>0 Vye¥}, (10) 


zeG 


if it is assumed that the function g(x, y) is concave with respect to z=Z, where Z is a convex 


compactum. 


In this case the function 


Fy (z)=f(2)—an { Imin[0; g(z, y) 11? dy 


is concave with respect to z. Further, ifz,° realizes max F,,?(z). then it follows from the 


method of penalties (see [1] ) that : 


lim F,7(2,°) =f (2°) 
and any limit point of the sequence {2,°}°_, proves to be a solution of problem (10). On using 
the estimates of [2] (which are obtained for finite Y under rigid conditions on g,(z)), we can 
choose a,, and a, in such a way that the algorithm (8) converges with probability unity to the set 
{z°} in the conditions demanded in [2] . Use may also be made of other estimates of the 
convergence of F,2(Z,°) to f(z’). 


The algorithm (8) can obviously be used when Y is a finite set. In particular, if Y consists 
of a single point, i.e., we are concerned with an ordinary constrained extremum problem, where 
the error is of the type (9), we obtain convergence with probability unity (when there are no errors, 
the convergence is guaranteed). This remark has something in common with the result obtained in 
[5] for penalties of a different kind, under more rigid smoothness conditions. 


Appendix 
ALGOL program and tests 


.The program written below, as a procedure approx in ALGOL 60, realizes the algorithm (8) 
(with provision for possible modifications in the case 0 </ <1) in the case when X is the 
p-dimensional cube, Y is the m-dimensional cube, z‘<[0, 1], i=1, 2,..., p, y¥=10, 1], j=1, 2,..., m, 
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and f(x, v) varies between 0 and 2. The parameters x, u of the procedure correspond to the running 
value x,,,U,,, and prior to access to the procedure they have to be given their initial values. The 
control constants i0, i] are the initial and final values of n; 70 is the step inn. After j/ steps the 
values of x, u are printed by means of the procedure output (x, uw), which can be replaced by 
another standard output procedure (according to the type of computer). The constants r, s specify 
the order of decrease of a,,, a, 0, (for instance, r = 1, s = 0.6). The procedure-function func 

(p, m, xX, y) has the value f(x, y). The procedure grad (p, m, x, y, g) has to evaluate the generalized 
gradient of f(x, y) with respect to x and assign it to g. The parameter J, O</ <1, serves for 
different modifications of the method; the value / > '% is used after obtaining the first approximation 
for sufficiently large 10, 1 <q <2. The procedure-function rand in the body of the procedure is a 
source of random numbers, uniformly distributed in the interval [0, 1] ; for concrete translators 

it can be replaced by a standard procedure (e.g., p1147 (al, rand) for the TA-1M); in this case the 
initial assignment to the variables a0, al has to be replaced by initial access to the standard 


procedure (p1147 (a1) for TA-1M). 


The approx procedure is best used several times, using the approximations obtained as the 
initial approximations, while increasing i0, i/, j/, until after a sufficient number of steps the upper 
bound of fluctuation for u is stabilized, and the value of x is established with the necessary 


accuracy*. If the fluctuation of u are sparse, 0 can be increased. 


In addition to the modifications provided for in the procedure (with / #0), it is possible, in 
certain problems which demand that u° be determined with increased accuracy, to introduce a 
variable upper bound for u for subsequent refinements. At the initial instant it is assigned the 
value M>, while later in the cycle it is re-assigned the value u, which does not satisfy the constraints 
(for h <0), and at each step wu is compared, not with M) (in the procedure, M, = 2), but with 


the variable value introduced. We have to see to it here that the amount of variation of x is within 
the given accuracy range, since otherwise the value M, must again be assigned to the bound for u. 
However, in problems where 5% accuracy is sufficient, there is usually no point in making our 
procedure more complicated. Some examples of rationalization of the procedure will be given, 


along with the tests. 


procedure approx (p, m, x, u, iO, il, j0, j1, r, s, l, g, func, grad); 
value p, m; unteger p, m, i0, il, j0, j1; array 2; 
real u, r, s, /, g; real procedure func; procedure grad; 
begin integer i, j, k; real a0, al, f, h, hl; array g [1: pl, y[1: m]; 
real procedure rand; 
begin real a; a: =a0 + al; a0: =al; 
if a>4 then a: =a—4; al: =a; rand: =a/4 
end rand; 
a0: = 3.14159265; al: = 0.542101887. h: = 0; 
for i: =i0 step 71 until il —/1 do 
begin for 7: = 1 step 70 until 77 do 
begin for k: =1 step 1 until m do y[k]: =rand; 
f: = func (p, m, x, y); 
h1: = if f>u then 0 else —(u —f) t (q —1); 
if hl <h then h: =hl1 else hi: =1 Xh+(1—l) XA; 
ur=u+l/(its)fr+hi(its)ts; 
ifu< 0 then u: =0; ifu>2 then u: = 2; 





*In the case when X° consists of more than one point, the variation of x is checked for cycling. 
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grad (p,m, 2, y, 8); —— 

for k: = 1 step 1 until p do 

begin z[k]: =2x[k]—g[k] XhAlf(it+y)ts; if c[k]<0 then x|[k]: =90; 
if x(k] >1 then z[k]: =1 

end 

end; output (x, u) 
end 
end approx 


The test examples were computed on the BESM-4 computer for the parameters a,=1/n, «,=n’/s 
(for a, = a/n it is advisable to choose a of the order of My). The computational results are given in 


Table 1. 


In Example I: f(z, y)=4+(z—"/2) (y—'/2), the exact solution is x°=1/,, u9=1. In Example 
II, f(z, y)=exp (—(z'—y)?) +exp (—(z?—y)*); here x=(zx', 2) is a two-dimensional vector, 
and 2°=(', 'o), w=1.55. In Example Ill, z=(z', 2%), y=(y', y?), f(z, y)=exp (—(2!—y!) 
2 (¢2—y2)2), 29—= (1/5, 1/2), u9=0.606. 


The ¢ column gives the computing time in minutes. 


TABLE 1 





i Modifi- 
xamples ; PF , x: ; [ 
Examp | l | il cations 





I { 0 0.500 
1/2 0.500 

(0, 0) 0.504 
0.504 

(0, 0) 0.497 
0.497 


(1/2, 1/2) 1 0.494 
0.494 


(1/2, 1/2) | 4. — 0.4999 
0.4999 
(1/2, 1/2) | 4. 0.5000 
0.5000 


(0, 0) 0.50 25 1(13), (11) 
00 0.49 


tf, 1 6 0.49999 4 (12) 
Oar 'ta) 0.50000 









































We also indicate the practical modifications of the approx procedure that were employed. 
For instance, the parameter / can be described as a variable, and not as a constant, real procedure 


1, and specified in the body of the approx procedure as follows: 


real procedure /; begin /:=i/(i+7) end; (11) 
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The description of j] for the variable end of the j cycle can be similarly modified: 
real procedure j7; begin j1:=i/200 end; (12) 


here, the i step must naturally remain constant and either be specified directly by a number, or 
denoted by a new identifier. It is also convenient to use a composite i cycle of the type 


for i:=0 step 50 until 404, 494 step 500 until 405 


The author thanks Yu. B. Germeier for his interest, and V. V. Fedorov for guidance and 


practical advice. 


Translated by D. E. 
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AN ITERATIVE METHOD WITH CHEBYSHEV PARAMETERS 
FOR FINDING THE MAXIMUM EIGENVALUE AND 
CORRESPONDING EIGENFUNCTION* 


V. I. LEBEDEV 
Moscow 


(Received 6 January 1975; revised 12 July 1976) 


TO ACCELERATE the convergence of the iterations when finding the maximum eigenvalue and 
corresponding eigenfunction, a method is proposed which employs infinite sequences of Chebyshev 
parameters and guarantees stability of the computations. A generalization of Bernoulli’s method 


is constructed. 


A typical example of the class of problems, in which an iterative method which accelerates 
the convergence of the iterations is used, is the difference analogue of the boundary value problem 
in a domain D with boundary I for the many-group diffusion equations [1]: 


n 4 
—div D; grad 9:+2.9;= >: x.4p; + ~ Xi QGP, (1) 


j=1 


0 
d; Bin + olr=0, 
On 





*Zh. vychisl. Mat mat. Fiz, 17, 1, 100—108, 1977. 
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n 


Qg= pa VO psPi. (3) 


jam 


It is required to find the least value of \ and the corresponding eigenfunction (q.,..., Qn). We 
write the problem in the operator form 


4 | 
Lo = —-x09. (4) 


Assuming that the operator L~! exists, putting r=Qq@, and A= QLZ-‘y and applying to 
both sides of (4) the operator QL ~!, we get 


Azr=Az. (5) 


We propose to dwell on the properties of the operators in (4) which were utilized when 
constructing our iterative method. 


Property 1. Finding the element u representing the solution of the equation Lu = y involves 
a fairly laborious iterative method, which includes both interior (for each 7) and exterior (with 
respect to 7) cycles of iterations. 


Property 2. The simultaneous storage in the computer memory of the values of the 9; for 
several iterations demands very large capacity, while storage of Q¢@ is not so difficult. 


Property 3. The eigenvalues of problem (4) will be assumed to be positive; from “physical 
considerations”, a lower bound can be set for the maximum eigenvalue. 


Property 4. The two largest eigenvalues may be close to one another, i.e., the method of simple 


iteration may converge slowly. 


Property 5. As a rule, the maximum eigenvalue is found to a given accuracy much faster than 
the eigenfunction; to find the latter, special methods for speeding up the convergence have to be 
used, based on information obtained when finding the eigenvalue. 


1. Iterative method 


Let A be a linear bounded operator, specified in Banach space B and having a complete 
linearly independent system of normalized eigenelements @;,..., Qn, .--, Corresponding to the 
eigenvalues Ay>A2>... >A,>... 20, where An>O as n—, and assume that a quantity 
O<a<A, can be specified a priori. Let I(x) be a linear functional of the adjoint space B*, 1,=1(q,), 
where 1,, 1,0. It is required to find the maximum eigenvalue \, and the eigenfunction ¢, of 


problem (5). 


We shall use an iterative method with variable displacement, the size of which will be 
determined by an infinite sequence of Chebyshev parameters, taken in a definite order. To find A, 
we shall use the so-called 7-sequence of parameters, while to find ¢, we use the U-sequence. 
These infinite sequences were described and studied in [2] ; they provide convergence of the 
iterations which is optimal in a definite class of errors, for a certain set of numbers of iterations 
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k; > °°, and ensure computational stability with respect to rounding errors. In [3, 4] , a cyclical 
method with Chebyshev parameters was described, for finding the eigenfunction. To find 

A; and Az, we extend to the case of variable displacements the well-known Bernoulli method 
[3, 5, 6]; this enables the eigenvalues of interest to be found more rapidly. This method is 
known to provide substantial acceleration of the convergence of A, to A; when A, and A, 
are close together. 


Before proceeding to find the eigenfunction, we turn our attention to the fact that the 
class of initial errors is transformed after performing the iterations for finding \,. Before 
proceeding to “uniform” suppression of the error components, we suppress the largest components 
by means of four intermediate iterations. 


We consider the following iterative method for finding A,, ¢,: given the initial approxima- 


= ) Co wa, 
n=1 


for which we assume that 1(z°)=1, C,°*0, and assuming that the error belongs to the class 


tion 


IC.8\<C., n=2,3, (6) 


where, here and below, C; > 0 are constants, we find the approximations 


+ 
=) 
n=1 


from the expressions 


BH Arp at, ht BM/1(Z), 


B,=[M+m+ (M—m) cos (no,) ]/2; 


{wr, k=1, 2,..., c} is an infinite sequence, w.=(0, 1); M, m are parameters, which 
we have at our disposal at each step of finding A, , $1. 


Stage 1. We first find A,, Az. For this, we put m = 0, M =a, while as {w,, k=4, 2,..., 
2} we take a T-sequence [2]. We denote by 7(N, p) the T-sequence for which {cos(m@,),k=1, 
2,..., Np"} are the same as the roots Z'yp"(x) of a Chebyshev polynomial of the first kind. 
To find A, Ap, we use the 7(2,3)-sequence in (8). Then, 0<8,<a and 


= (4-8) [ot Yvan) So |[} 


n=2 


-1 


+ Praia.) Soh], 


n=2 


to ‘ify sei “Mg 1 
= [ot Ya (in) Go Pn [ut Va (an) Gln 


n=2 
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yal (B*) = (a—Pr) [1+ Syme) <>] 


(A) =PA(A)/Pa(Qs), Pala) =] ] O80. 


Let 0,=2A,/a—1, then, for A,, 2 a, we have 





T= (9n) (A, — a/2 \*-# ‘ale X" 
k n - n / 
O< Px (An) < T (01) ord <bean) ’ 
where k=max j, 2-3’<k, while for A,=[0,a@] we have [2] 
| tp (An) | <a (A) /7x (81). 


where A=A,/a—1, and a(A)=1 for k=k or for A,>2a—A, and a(A)=CA-!, C 
>0, otherwise. 


It follows from (9)—(12) that the coefficients of ¢,, /,, in (9), (10) for n > 2 decreases 
in modulus more rapidly than in the method of simple iteration as k > , while for k =k 
class-(6)-optimal suppression occurs of the coefficients C,,? for whichA,=[0, a]. 


Assume that the values of Bx, Brit, Base, Yas Yaris Yr+2. are known. Then, putting 
C.-C, l C.=C, l, and retaining only the first two terms in the sums (10) (on the 


assumption that the terms with n > 2 are small for sufficiently large k, see (11), (12)), we 
obtain the system of four equations: 


C,.+-C,=D,, 


¥ a] (An—Br+s) -r.J] Yaut, m=O, 4,2, 
i=0 


n=1 i=0 


where D, is a non-zero constant. 


This is a generalized moment system. If 


P Ya+2—Yrgit Pago Bros 
ss h+1y 


Yrti— Yat Brgi—Pr 





Qr=YrOrn, Pr=Yr+ tT Brii—Bat On, 
t,=px/2+ (pi?/4—gn) *, ta=qx/t,, 
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where f,, f are the roots of the equation t?—p,t+q,=0, then 
A,=tt Bi, Ao=tet Bp. (14) 


This stage of the iterations, using the 7-sequence, is continued on the basis of expressions 
(7), (8), (13), (14) until we obtain stable values of A,, Ay. Assume that this occurs for k = ky, 
and assume that the eigenfunction determination accuracy criterion for terminating the iterations 


has not yet been satisfied. 


If A. <a, we continue the iterative process (7), (8) with the T sequence until the 
eigenfunction is obtained to the required accuracy. 


Stage 2. If X, >a, the class of errors (6) is transformed after k, iterations to the class 
Ca |<C,| a(n) |. (15) 


If \ = A>, the function |tp..(4) | with large k, has a sharp peak; this means that the error 
contains at this instant relatively large components C,:@, for the first n > 2. We smooth the 
resultant non-uniformity in the error by four iterations. For this, we first put 


Bris 1=Az, Barz2=0. (16) 


this choice of parameters eliminates the maximum of the transfer function for \ = A>, and 
smoothes the rounding errors appearing in the iterations in the components with large n, as a 
result of the subsequent use of the U sequence. Let 


P2(h) =(A—Brts) (A—Bats), D(A) =| tpa(A) (A(A2—A)) "|. 
The ideal choice of Bx4+s, Bai4,, would be that for which 
inf max |p2.(A)®@(A)I. 


By, +9 Bp +4 OmAgzA2 
1 1 


is reached. This choice presents a serious problem, however. We shall simplify it in two ways. 
First, we replace © (A) by a simpler function, retaining the characteristic properties of (A): 


©? (A) ~CoA(Aa—A) (14+ on, (24 / a—1)) / Tr.2(81) 
~C2(A(A2—A)) ""D; (A) Ae) [Tr.? (81), 


z—b 


, (x) =(1—2) (2"+6 (2—b)A (=) ) 


t=N/d2, O(x)=0 for <0, O(4)=1 for x>0, 
A=T»,(02), b=(1—s) / (1—s/4ky), s=a/ do. 


The quantities A and b are chosen in such a way that, with A=A2, i=0, 4 


di 
SS ‘aT ay. poe A — 
dz' A 2k (2A/a 1) dri 


di A/Ao—D S 


1—b 
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Then, on the basis of the results of [7], we define Bai+s, Bair. as the solution of the 


following problem: to find 
Aa 


min f p22 (A) @, (A/Aa) GA. 


Biz Prine 0 


Problem (17) can be solved in explicit form: 
Brits, hi+s=A2 (b+ (1—b) (p+m)), (18) 


where p=(tyts—tets) /2 (tyts—te”) , m= (p’— (tot.—ts”) / (t:ts—t2”) ) ”, t=1,+mi, 1.=1, my ~ 
(82. — (0.°—1)")*(4k,—3) [ (k, +4) (n/8) (1—b) 73], 1;=1;_,(4k,+ 2i—1)/ 
(4k,+2i+2), i=2, 3, 4, ma=m,(1—2y), ms=m,(5y’—4y+1), m= m,(15y*: 


—14y°—6y+1), y=[4(1—d) ]-'. 


If the required accuracy of finding y,; has not been achieved at this instant, the iterative 
process (7), (8) can be further continued in two ways. 


Stage 3’. We put M=),, m=( in Eq. (8) and again, starting with w,, use the T 
sequence. 


Stage 3’. Detailed account can be taken of the information obtained from the previous 
iterations, and the parameters «, more accurately chosen on this basis. Let 


h= max D,(A/A2) /T' x, (84). 


OmASA2 


We know that the function ‘px,+,(4) vanishes for \ = 0, Az. Then, in the class of errors 
ee SCR Or—ia))*, 2, 


it is advisable to use the following parameters in the method (7), (8): 


M=)., m=0, DWripstp—Ap, 


where {ap, p=1, 2, ..., ©} is any U sequence [2]. By U(N) we denote the U sequence for 
which {cos(map), p= 1, 2,..., 2"(N+41)—1} are the same as the roots of the Chebyshev 
polynomial of the 2nd kind of degree 2’"(N + 1) — 1. For clarity, we shall use the U(1)-sequence 


in (8). 


With this choice of «w,, optimal suppression of the coefficients C++? is achieved, for 
all p=2"—1, n=1, 2, ..., in the class (19), and the stability of the iterative process with 
respect to rounding errors is preserved. 


Notes. 1. In our construction of the iterative method (7), (8), (13), (14), we took account 
of certain unfavourable situations which may occur when realizing it numerically (e.g., A2*Ai, a@<As 
etc.). When the actual situation is favourable (the iterative process converges well), the 
expressions obtained cannot slow down the convergence of the iterations. 
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2. For the boundary value problem (1)—(3), Ax* is evaluated by an iterative process, 
which is best started with the values (q:*,..., @n"), obtained at the previous exterior iteration, 
while taking account of the last normalization. 


3. If, during the last stage of the iterations, the quantity d,*) (see (14)) again takes the 
stable value 4 *) (i.e., at this instant the main part of the error is concentrated in the 
eigenfunctions with eigenvalues in the neighbourhood of \‘*)), then we put 


— (h) 
Brpi=Ag 


and we continue the iterations with the U sequence. 


2. Construction of the T and U sequences 


The T (2, 3) sequence. We shall first define the sequence of permutations k3” by the 
following expressions [8]: x, = (1). If we know the permutation 


nea (js, cee js), 
where 1<j,<3"~", then we define the permutation x3” by the expression 


%s—= (ja, rE alii o Bo Ties, eee Dry 2:3" +h, a0 tidy “* Ji 
We put then, 


t,.=—t,. (22) 


Assume that a segment of the sequence {t,, k=1, 2, ..., 2-3"-*} has been constructed. Given 
the permutation «3”~! (see (21)), we construct the segment {t,, k=2-3"-'+4,... , 2-3"}from 
the expressions 


— 2jrastLir4s/2]) —1 


t2.3"-144141 = Sl Tt boa 144142 —basntg iret 


4-3" 





Bu : ' at 
bo.gn-t4. 414s (1—tegn-t4crge) % bo.gn-t4 4144 —bo.3"-14.4143, 


l=(),1,..., 3°-*—1. 


After this, we form x3” and evaluate {¢,,k=2-3"+4, ..., 2-3"**}, etc. 


For each k = 2 X 3”, this sequence gives the class-(6)-optimal convergence with 
An=[0, a]. 

The U(1) sequence. We first define the sequence of permutations x” by the following 
expressions [9]. If we know the permutation %2.--=(j,, ..., jo-2), where 1<j,<2"-’, 
then the permutation kj,— 1 is found from the expression 


Kon-1= (fi, 2*-*+1—j,, sey jny 2°" +1—jr, eat? 


We put u,=cos(na), then, 
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Let the sequence {u,, k=1, 2, ..., 2"—1}. be already constructed. Knowing the permutation 


Kyn—2, we construct the segment {u,, k=2",..., 2"*'—1} from the expressions 


2ht41—1 
sama aaa I 


Yn+i 


W244: = Sin U2" 44141 — W224 41, 


u =(1— Ng po 
2"44142— W2r+41441) 5 W224 4143 —W22 44142) 


l=0,1,...,2"-°—14. 


After this, we form kj,~, and evaluate {u,, k=2"*',..., 2"*°—4} ete. 


For each k = 2” — 1, this sequence gives the optimal convergence in the class (19). 


3. Program 


The ALGOL program realizing our method is as follows. The program is written in 
the form of a block. There are the following correspondences between the program 


identifiers and the notation of the present paper: 


. , : n (R—4) 4 (R) 
bi=Bxsi-s, §l=Yr+i-3) i=41, 2,3, lo=i, ’ i=}, 


l2=,:”, kp=x. 


The program uses access to the procedure ITER (063, g3, d), which, in accordance with 


the values b3, x*—! evaluates x* from expressions (7), computes the quantities g? and 
d=||x"—zx"-'|| and sends 2*—z*-', In the program, k2 = 0, 1, 2, 3, depending on 
whether the parameters are computed from expressions (22), (23), or (16), (18), (24), or 
(25), or (20) respectively. If d < eps (this condition can easily be replaced by another), 
the computation stops. For simplicity it is assumed that the blocks, connected with the 
operator A and xk, xk-1, and also the quantity eps are described and specified in some 


external block. 
begin integer k, k1, k2, i, 7, n; real b1, b2, 63, gl, g2, g3, 10, 11, 12, 13, cs, a, d, m, 
M, p, pl, p2, y; array kp; label Al, A2, A3, A4; 
k:= 0; M: =11: =a; 12: = g2: = g3: = b2: = 63: = a2; p: = 0.261799387799, kl: = 4; 


k2: = 1; 
Al: 10: =11; gl: = g2; g2: = g3; b1: = b2; 13: = 12; 62: = b3; k: =k +1: kl: = k1 +], 


if k2 =1 then begin 

if k1 = 5 then begin cs: = 0.707106781187; go to A3 end; 

if k1 =6 then begin n: =i: =kp[1]: =1; kl: =); 

if k = 2 then begin 42: = 0: gl: = g2: b1: = b2 end else k2: = 2; go to A2 end; 

if kl =0 then b3: = M; if k1 =1 then 03: =0; 

if kl =2 then b3: =cs; if k1 = 3 then b3: = p2; 

if k1 = 4 then b3: = Mj2; go to AZ end; 

if k2 = 3 then begin b3: = 12; k2: = 2; 13: =0; kl: = k1—1; go to A¢ end; 

if k1 =5 then begin k1: =1; i: =i+1 end; 

if k1 = 3 then begin cs: = sqrt (1—cs{2); go to Ad end; 

if k1 =1 then begin if k2 =0 then begin 

j: = kp [i] + entier (kp [i] /2), cs: =sin(p X(2 Xj —1)/n) end else cs: = sin(pX(2X 
kp{i]) —D/(2 X n)); go to A3 end; 

commént iterations with parameters (22), (23), (26); 

if i=n/A kl =4 then begin 

if 2 =0 then begin i: = 3 x n — 2; 

for 7 =7 step —/ intil 1 do begin 

kp (i): =kp[j); kp (i +1): =2xXn+kp[i); kpli +2): =2 Xn+1—kpl[i]; i: =i—3 
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end n:=3 Xn end else begin i: = 2 X n —1; 
for j =n step —J until 7 do begin 
kp{i): =kp[j]; kp [i441]: =2xXn —1—kp|[j]; it:=i—2 end; n: =2Xnend; i: = 0 end; 
comment new x is obtained; 
A2: cs: = —cs: 
A3: b3: = M X (1 —cs) / 2; 
A4: procedure ITER (b3, d, g3); 
m: = b3 + g3; y: = g2— g1 + b2 — dl: 
if abs (y) > 10}(— 9) then begin 
y: = g2 X (m — g2 — b2) /y; 11: =(g2 +02 —b1 + y)/2; pl:=gl Xy; y:=Uf2— pl; 


if y< 0 then begin /1:=m; 12: =0; goto Al end; /1:=/1+-sqrt (y); 12:=p1/l1+01; 
11: =11 +01; 

if k2 £0 / abs (l1 — m) > abs (12 — m) then begin y: =11; 11: = 12; 12: =y end end 
else begin 11:= m; 12: =0 end; 

comment new \,, Az are obtained; 

y: = abs (l11—10); if y>eps then go to Al; 

if k2 = 0 then begin 

if a< 12 / abs ((12 — 13) /11)< 5 X 10} (—3) 

then begin comment iterations with T-sequence are terminated; 

k2: =1; p: =3Xp; kl: =4Xk + 4: 10: = sqrt (1 —a/12); 10: = (1 — 10) |(1 +10 | (k1 —4)); 
y: =1/(2X(1 — 10); 

if a: = 0 then m: =0 else begin 

m: = 2 X12/a—1; m: =(m— sgrt(mt2—1))t(2 Xk) X (k1—1) X2 Xy X sqrt (2X 
pxXk1l xX y) end; es: =1+ m; 13: =(k1 —1)/(k1 42); pl: =m X (1L—2Xy) +138; 
13: = 13 X (k1 +- 1) /(k1 +4); p2:=mxX(yX(6 Xy—441)413; 13: =mxX(yX 
(y X (15 — 14 x y) —6) +1413 X (k1 4-3) /(K1 +6); ys: =es KX p2— plt2; m= 
cs X13—plX p2; M:=p1Xl3—p2t2; p2:=m/(2xy); k:=—1,; 
m: = sqrt(p2t2—M/y); M:=12; cs: =12 xX (10+ (1—10) X(p2+™m)); p2:=12X 
(10 +-(1 — 10) X (p2 — m)); end end; 

if k2 =2 A abs ((12 — 13) /U)< 5 X 10t (—3)AU>WAW>O0AkKI< 3 then k2: =3; 

if d >eps then go to Al end; 


Computations of the problem (1)—(3), performed by S. A. Frolov, Yu. A. Vlasov, 
and §. I. Konyaev, showed that the present method offers a considerable advantage in 
convergence rate over the power method. 


Translated by D. E. Brown 
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A VARIATIONAL-DIFFERENCE METHOD FOR SOLVING TWO-DIMENSIONAL 
LINEAR PARABOLIC EQUATIONS* 


Yu. R. AKOPYAN and L. A. OGANESYAN 
Leningrad 
(Received 26 May 1975; revised 4 November 1975) 


IMPLICIT variational-difference schemes are constructed for the first and third initial-boundary 
value problems for linear parabolic equations in a two-dimensional domain with a smooth boundary. 
It is assumed that the coefficients of the equation are sufficiently smooth and that the right-hand 
side belongs to the space 4. Order-wise exact estimates are obtained for the convergence rate of 
the schemes in the norm of the energy space; the estimates are equal to the diameter of the set of 
solutions of the differential equations in this space. 


Below we construct implicit variational-difference schemes (v.d.s) for the first and third 
initial-boundary value problems for two-dimensional linear parabolic equations with a right-hand 
side belonging to L.(QX(0, 7)), in which the time derivative is replaced by the backwards 
difference ratio. A discussion of v.d.s. for such problems can be found in [1—4] ; there, convergence 
rate estimates are obtained for the schemes under various assumptions about the smoothness of 
the right-hand side. In the present paper no assumption is made about the smoothness of the right- 
hand side, and it is found that the schemes are optimal in a sense for T=O(h*), where h is the 
mesh step, and 7 is the time step; the convergence rate estimate is of the O(/). 


1. Notation, and construction of the v.d.s. 


1. Ina cylindrical domain Q=QX (0, 7’) with lateral surface I’, where 22 is a bounded 
simply connected domain of space R» of points (2,, 2) =(z, y) with boundary S=C’, we 
consider the equation 


with the initial condition 


and the boundary condition 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 109-118, 1977. 
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2 du p 
| y’ ai; —— cos (Vv, 2;) +ou] =0, (1.4) 
fae Ox; e 


where v is the outward normal to S. 


We denote the norm in Sobolev—Slobodetskii space wmn(Q) [5] by Il-Ilm,n. a, the 
norm in Sobolev space W>’"(Q) by || - Im, g, and the norm in L(Q) by I+ llo, o. We denote by 
V10 (Q) the space consisting of all the elements of W}-° (Q), having the finite norm 


lulo= sup ||w(¢t) llootll |Vul llo,es 
0g igT 
where 


Val hd )" 
ul= (|5> Oy 


We assume that 


aveC*(Q), bEC(Q), aeC(Q), oEC'(T), fels(Q). 


2 2 2 
O0<tUo »¥ E?< :¥ aby » 80, Lo, Ly=const. 


imei {jemi imei 


Then, an initial-bounded value problem has a unique solution u=W21 (Q) and we have 


the estimate [6] 
ll ll2, 4, eSClfllo, Q: (1.6) 


Throughout, the letter C with or without subscripts denotes a positive constant, regardless 


of any factors that may be present. 
2. We specify a positive parameter h, which we call the mesh step. 


Let us construct the mesh domain 9” for the third initial-boundary value problem. For this, 
we superimpose a square mesh of step A on the domain Q, in such a way that the mesh lines are 
parallel to the coordinate axes. A regular rectangular mesh may also be taken, in which the lengths 
of sides of the cell are of order h. We divide the mesh cells into triangles by a diagonal at an angle 
7/4 to the x, axis. As Q/ we take the least union of triangles containing 2. 


For problem (1.1)—(1.3) with respect to the domain Q, we define the mesh domain Q? with 
boundary S$”, which satisfies the following conditions [7, 8]: 1) the domain Q”, bounded by the 
step-line S”, lies in the domain 2; 2) between points of the step-line S" and S we establish a 
one-to-one correspondence with the aid of the normals to S; 3) the lengths of the sections of the 
step-line S” are bounded from below by /h; 4) the distances from points of S” to S do not exceed 
5h2. We then divide the domain 9” into triangles [7], the lengths of sides of which lie in the 
range [1,h, 122], and their areas in the range | s,h. s,h*]. Here, the positive constants 1, 6, J, 
l2, $4, S2 are independent of h. The choice of these constants is determined by the properties of 
the curve S and the algorithm for constructing 2". 


Henceforth, unless stipulated ortherwise, 2” means the mesh domain, constructed for the 
first or third initial-boundary value problem. 
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The set of vertices and sides of the triangles of the triangulation form a mesh, and the 
vertices of the triangles will be called the mesh base-points. We shall assume that all the base-points 
are numbered in a certain order. We denote by (7m) the m-th base-point (2m, Ym). We introduce 
the following notation: R? is the set of base-points belonging to 2”; r’ is the set of base-points 
belonging to S”; and R’ is the set of base-points belonging to 2? . 


We put Q"=Q"X (0, 7’). We divide the interval [0, 7] into equal parts with the step 
t, tz=nt, n=U, 1,..., N. 


For each base-point (7) =R" we define the function $m (x, y), which is equal to unity at 
the base-point (m), and to zero at the other base-points, while it is interpolated piecewise linearly 
in Q/. Outside 2” the function is identically zero. We put 


Qm (z, y); if te | ae 2 


mn (\Z, y, t) = 
@ ( y ) { 0, if te (h-4, t. |, 


where (m) =R", n=1, 2,..., N. 


Let v={Um} and Ww={W,n,} be the mesh functions, specified at the points (2m, Ym)and 
(Xm, Ym, t,) respectively. We introduce the notation: 


o(z,y)= x. Um@m (zy), 


(m) ERR 


@ (x, y, t)= y 2 WmnQmn (x,y, t), 


n=1 (m) 


con® py WmnPmn (2, Y, t). (1.8) 


n=1 (m)eERr 


The set of functions of the type (1.7) will be denoted by H;,,, and of the type (1.8), by H;,-. 


The arguments x, y, and also ¢, will sometimes be omitted when writing a function. 


3. We put 


F(u, oe foe age V fost ouge ae 


i,j=i Q i=iQ 


2, (u, 0) =P, (u, 0) +f ou® dy, 
Tr 


where, here and below, dO=dzdydt. 


For the first initial-boundary value problem, as the approximate solution we take the function 
veH at, Which satisfies the integral identity 


oY f lta) )G (ta) dQ+Z (6, 9) = J1G40 (1.9) 


n=i Q 


for arbitrary EA, where (v (ta) )i is t7!(V (tn) —v(t,.,)). Here and below dQ=dzdy. 
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As the approximate solution of the third initial boundary value problem we take a function 
d<=H,,, which satisfies the integral identity 


N 


Vf ((t)) (44) d0+25(6, 9) =f 140 (1.10) 


n=i Q 


for arbitrary 6=Ap,. 


2. Approximation theorems 


Denote by u/ the Steklov average of the function wu: 
z+h yth 


wr(z,y, = | J u(E, n, t)dE dn. 
x—h y—h 


Theorem | 


If w=W?! (Q) and is continued into the space R3 of points (x, y, t) while retaining its 
class and norm [5], then we have 


lu—u" | on<C (h+1'*+th-') lwll2, +, @ 


ju—ubllo, eC (h?+1) llulle, 1. ¢. 
Proof. We have the inequality 
i | V (u—u") | Ilo, gs | | V (u—u") | Ilo, grt || | V (ux—w") | llo, Q. 


Further, 


Mo 


NIV (a2) loon < { JV (a(t) —u¥(t)) Iw at } 


+{ Suv ee) 2) Iho ay". 


Noting that w*(t) =u" (tn) for t=(tn-s, tr], we get 


/a 
<Cth-"|lulle,s,¢. 


{ fi IV (u(t) —u*(t)) | lac" at} 


The following estimates hold (see e.g., [8] ): 


I | V (u—u") | llo, SCh|lul., 1, Q» 


pees a 
{ fl | V (u* (t) —u"(t) ) loan dt } <Chllulls,s.¢. 
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It follows from (2.3)—(2.7) that 


IV (wut) | lle, eC (A+ th) [lull 4, @ 
We can show in just the same way that 


sup [lu (t)—u*(t) llogr<Cr"llulls,g+Ch sup [lz (2) I:,0%, 


O<giggT 0QieT 


whence, using the inequality [5] 


sup ||w(t) ll1o»<Cllull2.1.¢, 
0<(<T 
we obtain 


sup [lw (t)—u*(£) lloox<C (h+1") llullo.1.¢. 


Omit 


From (2.8) and (2.9) we get inequality (2.1). 


The proof of inequality (2.2) is essentially the same as the proof of (2.1) and may therefore 
be omitted. The theorem is proved. 


Let Q/ be the mesh domain constructed for the first initial-boundary value problem. We have: 


Theorem 2 


Let the function w=W2! (Q), w|r=O0 and be continued into R3 while retaining its class 
and norm [5]. Then, we have the inequalities 


lu—w"Ile<C(h+1"+th-) llulle, 1, (2.10) 


llu—w" Ilo, o<C(h? +7) llulla, 1, o. 


Proof, Since u*=0 in Q\Q*,we have 


[ua | o<|U|Q. ont |u—a"| gat |ur—u" |’. 


Noting that the width of the strip Q \ Q” is of order h2, we obtain [8, 9] 
lwlagr<Chllulls, 1, o (2.13) 
The second term on the right-hand side of (2.12) is estimated in accordance with Theorem 1. 


Let w be a sufficiently smooth function, specified in Q, and such that w|,=Q. Since the 
function w"(t) —w"(t) is non-zero only in the triangles of which at least one vertex lies on S”, 
we have 
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IV (wow) I lo,qr<Cth- I wlle,s.o+ { SJ lw" (am Yost) 


(myer? 0 


T 


-w (em ym t) Pat} + | >. flr Gem yt) Fae} 


(myer? 0 


The following estimates for the second and third terms on the right-hand side of (2.14): 


{ 2 fam Ym; £) —W (Xm, Ym, t) |* dt } 


(m)erh 0 


la 
<Ch||wl21,0; (2.15) 


"Vo 
{ y f 20 (ns Yon t) I? at } <Chllwlles.¢ 


(myer 0 


were in fact obtained in [8]. 
From (2.14)—(2.16) we have 


| V (w'—w") | Ilo, e<C (A+ th) [lwlle, 4, o (2.17) 


We have proved this inequality for a sufficiently smooth function w. It obviously also holds 
for uw=W?2 (Q). 


We can show in the same way that 


sup |[u*(t)—1" (t) lloo»<C (h+7") |lullo,1,0. (2.18) 


OQt<T 


Inequality (2.10) follows from (2.12), (2.13), (2.17), (2.18) and (2.1). The proof of inequality 
(2.11) is similar. The theorem is proved. 
3. Convergence rate estimates 


Let us now consider the rate of convergence of the approximate to the exact solution. Here 


we have: 


Theorem 3 


Assume that the conditions (1.5) and the estimate (1.6) hold; then, for T=O (h’) , assuming 
that h is sufficiently small, we have 


lu—vle<C(T)Allfllo, os (3.1) 
lu—BIgSC(T) hllfllo, Q) (3.2) 


where C(7) is a positive constant, dependent on T. 
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Proof. We start with the first initial-boundary value problem. Given any eek, + we have 


Vf (ui (w (tn) ):G (tn) dQ4+L, (u, =| wae. (33) 


al 


Denote the function u"—v by w. Taking w as ¢, we obtain from (1.9) and (3.3): 


t bd f (i (tn) )it (tp) Q+L, (wo iv) 


(a" (t,.) —w (tp) )W (tr) dQ+H, (a"—u, wv). 


We transform the first term on the left-hand side of (3.4): 


>Y J Ge) id (4) dQ 


n=i1 Q 


=z fieor iar BW FI (i (ta) 1? dQ. 


n=1Q 


We find an upper bound for the first term on the right-hand side of (3.4): 


oe Si (a (t,) —u (ta) ) 00 (t,) dQ= Joa (T) —u(T))w(T)dQ 


N==i Q 


ey J (t,-1) —w (tas) ( (ta) Jed 


N=i Q 


N 
<e'lu—alote, | |w(T) |?dQ+e,-! be J lw (tn) —u"(t,) |? dQ 


n=1i Q 


eke J | (W (tn) zl? dQ. 


n=i Q 


A bound is easily obtained for the second term on the right-hand side of (3.4): 
L,(u'—u, w)<Ce“|u—u"| 2+e |w] .?. 
Putting ey='/, and@2="/2 in (3.6), we find from (3.4)—(3.7) that 


= fier dQ+L, (w, bell f lu (t,) —“a*(t,) [2 dQ 


n=1 Q 


+C€ (A+e7') lu—u"le?telwle’. 





Yu. R. Akopyan and L. A. Oganesyan 


The following bound can be obtained for the first time on the right-hand side of (3.8): 


y } | (t,)—u*(tn) |? dQ=1-! >. j ftw (t,) a (2) |? dQ dt 


n=i Q n=1 th, Q 


<Crllulle.¢+Ct-'|lu—u"llo.o- 


From (3.8) and (3.9) we obtain 


1 is — . 
—{ lw (7) |? dQ+L, (w, @) <elwle2+C (tIlulles.e 
4 


+1-Ju—u"llo ot A te7!) Ju—u" lg?) =I. 


It can easily be shown that 
P, (ib, @) >Cill | Vol llee—Call love. 
From (3.10) and (3.11) we get 


J 1 (7)? dQ+I|| Vid leg<Collddlleg+Cul. 


Using similar arguments, we can show that, for any ¢,,n =1,2,...,N, we have 
tn 
flm@,) Page, | { l(t) taQat+e. 
Q °o 2 
Using (3.12) and (3.13), we can easily see that 
lwlo<c(T)I". 
On making a suitable choice of € in the expression for / and using Theorem 2 and the last 


inequality, we finally get 


|u—V1¢<lu—t"lgtlwle<C (T)hllulls, 1, e<C (7) hllfllo, o. 


The estimate (3.2) can be proved similarly. The theorem is proved. 


Notes. 1. For the first initial-boundary value problem, the v.d.s. can be written in a somewhat 
different way: as the approximate solution we take a function v<Hp, x, which satisfies the 


equation 


T (Usa)? fon dQ+L, (v, Pmn) _— f forme dQ 
a @ (3.14) 


for all(m) =R* andn=1, 2,..., N. The convergence rate estimate is the same as before, but this 
latter scheme is more convenient. 
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2. Since the v.d.s’s considered are implicit, the question of their numerical realization arises. 
Noting that r = O(h2), we can show that the systems of equations (1.9), (1.10) and (3.14) can be 
solved by the method of simple iterations with accuracy ¢ after O(h-?|In eh|) iterations. This 
subject will be dealt with in greater detail in later papers. 


4. On the accuracy of the estimates 


We shall consider approximate methods in which the approximate solution of the 
initial-boundary value problem, with right-hand side belonging to L4(Q), is sought as an element 
of some R-dimensional subspace in V}.° (Q) , with a basis consisting of standard functions. 


The set K of solutions of the initial boundary value problem with right-hand side belonging 
to the sphere ||fllo, g<4 is a compact set in V}° (Q) [*]. 


Our problem lies in finding the accuracy to which the R-dimensional hyperplane approximates 
the compact set K, i.e., in obtaining Kolmogorov lower and upper bounds for the R-diameter 
dp(K) of K [10]: 


dp(K)= inf sup inf lu—vle, 


L pcve(@) ueK vel, 
where Lp is an R-dimensional subspace in V}:° (Q). 
Since ||wlli, 0, o<|w| ewe have 
dz(K)>d,*(K), 


d,’ (K)= inf sup inf ||u—v||,,0,¢. 


1,0 
LRow's(Q) ueK vel, 


We shall assume for simplicity that 7 = 7 and that the domain 2 contains the rectangle 
{0<z<n, O<y<m}. We denote by P the cube {0<z<n, 0Sy<n, 0<t<z}. 


Notice that every subspace Lp defines in W1° (P) a subspace, the dimensionality of which 
is not greater than the number R. Hence 


de (K)> inf — sup inf |lu—vlliop. (4.2) 


Lpcwy(P) uwek vel, 


We shall assume for simplicity that R = MN, where \/M and N are integers. 


We consider in W1° (P) the subspace G, whose basis is formed by the functions 
sin kx sin ly sin nt fork, l=VM, VM+1,..., 2¥M and n=1,2,..., N+4. 


It is easily shown by direct calculations that, for any u=G 


2/M N+i 


lel s.0,.2~ | y Y tain (+2) 1; 


k,l=/¥M n=i 
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2M N44 


'y : 
lullse~ { >. vial (E+E) +n?) } (cont'd) 


Ril=v¥M n=i 


where Ux), are the Fourier coefficients of the function u. Then, 


"lo 
full. 2>C ( ) lulls. s.r- (4.3) 


M 
M?+N? 


Consider an arbitrary subspace Lz in W}° (P). Since the dimensionality of Lp is less 
than that of G, there will be an element uy pin G which is orthogonal to L, in W}° (P). 


Then, for any v&Lp 


lwra—Vls, 0, P=llwralls, o, vp. (4.4) 


We continue the function uz p into the whole of Q while preserving the class and norm of 
W}° (P), in such a way that the function vanishes close to T and uz, |1-.=0. The function will 
then satisfy the initial and boundary conditions (1.2)—(1.4). We shall assume that ||wz,,|l2, ., p= 
(CsCs)~*, where Cs and C¢ are the constants in the inequalities 


l|wll>, {, eSC;|| ul 2, i, Py || Zul, eSCgllu 2, 1, Qe 


It follows from what has been said that u,,€K. 


Then we find from (4.3) and (4.4) that, given any subspace Lp in W!° (P) 


"py 
sup inf |lu—v]|ior>lluz, lueoe(——) ; 


uek vel, M*+N? 


From (4.1), (4.2), and (4.5) we obtain the inequality 
auth) >C( Sera) 
whence it follows that, for M ~ N, 
d,(K)>CR-". 
An upper bound for the diameter is provided by the estimates (3.1) and (3.2), which 
likewise are of order R~. 


In short, upper and lower bounds of the same order of accuracy have been obtained. Hence 
the estimates (3.1) and (3.2) are not improvable with respect to order. 


Translated by D. E. Brown 
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ON NUMERICAL ISOLATION OF THE BOUNDED SOLUTIONS OF SYSTEMS 
OF LINEAR PARTIAL DIFFERENTIAL EQUATIONS OF 
THE EVOLUTIONARY TYPE* 


SH. M. NASIBOV 
Baku 


(Received 24 February 1976) 


NUMERICAL isolation of the bounded solutions is discussed for certain systems of linear partial 
differential equations of the evolutionary type. Boundedness of the solutions at infinity, or at 
singular points, where the coefficients become infinite, is taken as the boundary condition at the 
relevant points. The manifold of bounded solutions is isolated without investigating the complicated 
asymptotic behaviour of the particular solutions at these points. An applied problem is considered 


as an example. 


1. Isolation of the solutions, bounded at infinity 


1. We consider a linear differential operator acting on vector functions u of two independent 


variables x and fr: 
0 Uy 


a 
P, tul=H, | _, |-- 
ai Is ” 0 


x Ug 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 119-135, 1977. 
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where H,[0/dt] is a linear differential operator of the form H.[0/dt]= A,0*/ 0?+A,0/ dt, Ao, 
A, are constant square matrices of order g, and Y2[0/ dx] is a second-order linear differential 
operator of the form #,[0/ dx]=6" / dx°+B(x), where B(x) is a square matrix of order q, 
whose elements depend only on the one independent variable x. 


Suppose that we want to find the vector function u, representing a solution of the equation 


(1.1) 
in Q. .=R.{x|a<x<@}X(0, 7). where 7 is arbitrary, under the initial conditions 
u(x, 0)=qo(z), (du / dt) |,-.=@1 (2). (1.2) 


At the left-hand end x = a of the semi-infinite interval, the function u(x, f) satisfies a linear 


boundary condition, and as x > ©, the condition 


sup jexp(—y,t)u,(2,t)|=O(1), (1.3) 


where 7, are non-negative constants. We assume that the matrix B(x), the initial conditions, and 
the function f(x), are continuous functions in [a, °). 


When integrating problem (1.1)—(1.3) numerically, the question arises of the correct 
translation of condition (1.3) from infinity to a finite point. 


We shall assume that u(t), u, Ou / Ox, O’u/ Ox’, regarded as functions of f, are originals. 
We denote the image of u by 


v (2, p) =| e~?'u (za, t) dt, 


0 


and the image of u(t) by 


M (p) =| e~?' (t) dt. 


0 
We transform to images in (1.1). We obtain as a result the systems of ordinary differential 
equations 


d°v(x, p) / dx’ +[B(x)—H2(p) ]u=—g(z, p), (1.4) 


where g(z, p) =F (x, p)+Fo(z, p), F(x, p)=f(z)M(p), Fo=AcpoptAcQitA iGo, A2(p) = 
‘Ayp’+A,p, with the condition 


|v(z, p)|=O(1) as 2-0 uniformly with respect to p. (1.5) 


The matrix B(x), the functions @o(2), @i(Z) and f(x) have specified asymptotic forms 
asx > ©: 
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It follows from [1] that, for large x, the manifolds of bounded solutions of system (1.4)—(1.6) 


are described by the equation 


dv(x, p) /dz=a(zx, p)vt+$(z, p). 


Here, for a we have 


da / dz+a’*+B(x)—H:(p) =0, 


eee ? Qo, 


a(x, p) ~\) ap as g-+ 00, 
k=0 


The Qp is a square matrix, dependent on the complex parameter p, such that &°=H.2(p) —B, ; all 
its eigenvalues must have negative real parts. We shall see later that such a choice is possible and is 


unique. Further, we have for B(x, p): 


dp /drta(x, p)B(x, p)=g(z, P), 


B | z-+0—>Po= Qo Zo, 


an BO 


k=0 
where g>=M (p) fot+Ao@o, oP tAoG:, oF Aso, o. 
The matrices a, and 6, are determined formally from the recurrence relations 


a,a;+B,»= (m—1) &m-=1, 


l+j=m,; 1,j>0 


oBm=Smt (m—1) Bm-1— a &:Bj—Bo&m, 


l+jam; 1,j>0 


Bm=M (Pp) fm+A Po, mPtAoGs, a TAM, «: m=1, 2,.... 


(1.7) 


(1.8) 


(1.9) 


(1.10) 


(1.11) 
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It follows immediately from (1.7) that the boundary value problem in (a, °°) for system 
(1.4) with the given boundary condition at the point x =a and the condition |v(z, p)|—>0 as 
x > can be reduced to the equivalent problem in [a, xoo] , where, as the boundary condition at 
the point Xoo, we write the equation of the manifold (1.7), stable as x > °°. After Laplace inversion, 
the relation obtained in (1.7), if it exists, describes the nature of the manifold of stable solutions 
as x > © of Eq. (1.1), which has an irregular singularity at infinity. Because of this, condition 
(1.3) for Eq. (1.1) can be effectively “displaced” from infinity to a finite point xo. and the above 
boundary value problem (1.1)—(1.3) in [a, °°) reduces to the equivalent problem in [a, Xo] . 


2. We will consider the case when, in Eq. (1.1), Aa=0, A,y=£, where F is the unit matrix. 
With a view to reducing the amount of calculations, we put f= 0, ¢9 = 0. (The case f#0, po*0 is 
treated in a similar way). In short, we consider the system of linear parabolic equations 


° q 
— =+¥" B,(z)m, 1<j<gq. (1.15) 
Ot Ox” ar 


We assume that all the eigenvalues {41,..., Ag of the matrix Bo = B(ee) are simple. We can 
then assume without loss of generality that Bp is a diagonal matrix, with a along the 
diagonal. 


In the case of the system obtained by Laplace inversion from (1.15), Eq. (1.8) and condition 
(1.9) become 


a't+a°+B(xr)—pE=0, alsxso—>Qo. (1.16) 


Here, ap is found from the matrix equation a.°=pE—B,, containing the complex parameter p, 
with a condition such that the eigenvalues have a negative real part. Denote by pp the maximum 
real part of any eigenvalue of the matrix Bo, i-e., we put 
Po= max ReA,. 
iskegq 

As functions of the complex argument p, the matrix elements [oo (p) J.sx=— (p—Au) 75x, 
1</, j<q, are defined for all values of p in the right-hand half-plane Re p > pg, they have no 
branching points, and they have a negative real part (below, W", Re W>0, is always the value 
of the root which lies in the right-hand half-plane). After this unique choice of ag(p), it is easily 
shown, by arguments similar to those employed in [1], that Eq. (1.16) with condition (1.14) has 
a unique solution for all values of the complex parameter in the right-hand half-plane 


Re p>, @o=max{o, max y;}, 
i<kQeQ 


and for large x, this solution can be expanded in the asymptotic series (1.10), whose coefficients 
are formally defined from (1.13). On determining a,,,(p) successively from the recurrence 
relations (1.13), we can show by induction that 


m 


Om (p) = Seco (p), 


hk=1 
where C; are constant matrices, and m = 1, 2, . . . The equation of the manifold of solutions, 


bounded as x > ©, of the system obtained by Laplace inversion of (1.15), takes the form, after 
multiplying on the left by the inverse matrix ag~ !(p): 
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(1.17) 


m+i 


Om (p)=)" Cxtto-*(p). 


detllae(p) I=] J (p—aa)"*#0 


1i<k< 


for all p in the half-plane Re p > wo, then ag ~!(p) exists. We can show directly that the matrices 
ay *(p), of the form 
[a*(p) Ja= (p—A,;) mh, 1<j, lxq, 


possess the matrix-originals #, (t)for Re p > pg, where 


exp (A;t) (t—A;) °-?” 
R,(t) an = 6, 1a ee. 
[Ax (t) Jin T(E/2) j,l<q 





Noting that duv(z, p)/dx and v(x, p) are images, with 


Re p> max Ya, 
ismkag 


of the vector-function-originals Qu/Ox and w(z, t), respectively, and using the theorem on the 
multiplication of images, which holds for Re p>w,=max {o, max Y,}, we obtain from (1.17): 


a 
R(t) — = —u+R(z, t)*u(z,t), 
x 


R(x, t) ~ 8 sa , 


Mani 


and the symbol * denotes the convolution of the functions g(t) and h(t): 


g(t) *h(t) = ) g(t—1) h(x) dt. 


We now take the case when the eigenvalues of the matrix Bp are not simple. Let their 
multiplicities be v1,..., Va; Vit... +v,=Q. Using the same arguments as in the case when all 


the eigenvalues are simple, we can assume that Bp has the Jordan form 


B=) 4,,, 


Vit... $VK= 


where Jy,,..., J yp are the lower triangular Jordan cells, corresponding to the multiplicities 
V1>-++ 5M Of the eigenvalues A,,. . . , Ay. The matrix ap(p) is a single-valued analytic function of 
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the complex variable p in the half-plane 


Re p>po= max Ax 
imkegqd 
and has the form 


Oo (p) =— i SI, 


Vit. $VK=9 


1? 


where Jp, 9=Vi,.--, Va, iS a triangular matrix of the form 


| 9-0 /« ‘ (2p-1)/2 
| 2° (29 — 3)! (p — Ap)” 


- ° ma — —i ke 
and it has an inversea&~'(p) =Jy,-'® ... Ply ~*, since 


detllao(p)II= []  (p—a,)" 


Vit... FVK=Q 


nowhere vanishes in the half-plane Re p > po. It is easily shown that 


(o-3= 


? 


fae { (21—2j+41)/2 
ere) 
detllZ,I gd 


where 1S/<p, 1<jS</, p=v, v,, Ci; are certain constants. 


In the same way, we have 


4 (21—2j41)/2 
P—ho 


(16-15 Cry ( 


since det ||a.~*(p) || 0 in the half-plane Re p > py. Obviously, for Re p > pg, the matrices 
ao *(p), k=41, 2,..., have the matrix-originals 


Oo~*(p) =, (t) =F, 2 eee OR, VR 


Rr (t) |l,s=const exp (Apt) (t—A,) A-7#8-? 73, 1<l<p, 
1SjSl, p=vy,..., Va 


Hence, we can see that, in the half-plane 


Re p>@.=max {y, a}, 


qd 


d = max [ RellB (~) lla + ¥ | B4 (co) ] >p.> max Re A, 


isi<g Foy i<h<q 
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and y is the growth exponent with respect to t of the function u(x, t), representing the solution, 
bounded as x > ©, of the system (1.15) with the appropriate initial and boundary data, it is 
possible to pass from (1.17) to originals. 


3. Let us return to Sec. 1. We assume for simplicity that the matrices Ag, A,, and Bo have 
the diagonal form. Obviously, if Re Ag > 0, then a positive constant w, exists, such that, in the 
domain Re p > w,, the two-valued expression [H,(p)—B,]" can be divided, on the one hand 
into a regular branch with positive real part Re {{H.(p)—B,]"*}>0, and on the other hand, 
into elements of the matrix D,(p) =||H2(p) —Bo|lu=H2"(p) —B,", 1S/S<q, which do not 
vanish. Hence the matrices ag *(p) exist for Re p > w , since det ||ao(p) || =IID,(p) #0, and 
they have the form ||a@o~"(P) llu=(—1)"[D.(p) ]“', 1<l<gq. In addition, with Re p > wy, the 
elements of the matrices @~*(p), k=1, 2..., are analytic functions with no zeros, they tend 
to zero as | p | > © uniformly with respect to arg p, and the integral 


®+i00 


} lao-*(p) |dp 


exists for all kK = 1, 2,...; hence ay *(p) are the images of matrices #, (t), where 


@+ foo 


leeo-*(p) les =1Ra(E) lav (—1)* f 


@—i0o 


e?' dp 
[Di (p) ]** ° 


1<l<q. 


Finally, it must be mentioned that, after multiplying by ag~ !(p), we can pass to images 
in (1.7) in the half-plane Re p>@.=max {@;, Y}, provided that we can justify the operation 


eo 


2 [(P2™)ven]=Vte-tainoen cs 


m=1 m==1 
where #-' is the inverse Laplace transform. To this end, we shall prove: 


Lemma 


Given any fixed z=[a, ©) let the functions F(x, p), v(x, p), and every term of the functional 
sequence {c,(p)}, regarded as functions of the complex argument p, be uniquely defined in the 
half-plane M,={p|Re p>}. Further, suppose that: 


Cc ; 
a) the series y a is the asymptotic (as x > ©) series of the function F(x, p), specified 
he 


in Q.p=[a, ©) XM, uniformly with respect to p; 


b) the integrals 
@+i00 @+1ce 
i) v(x, p)e?' dp, ) W (zx, p)e?' dp, 
@—ico @—ico 
where W (za, p) =F (zx, p)v(zx, p), and 


@+100 


J ex(p)er dp, 


@—ico 
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where k= 1,2,...,exist for Re p > wo; 


c) the function 
@+100 


u(x, t) = (2ni)- J v(x, pe” dp 


@—too 


is bounded for all r=[a, co) for any fixed t@(0, 7’), where 7 is an arbitrary number. 


We then have 


@+1co ) @+100 
J W (zx, p)e?' dp -\" e ) C,(p)v (a, p)e™ dp 
@—tco k=0 @—i00 


as x > © uniformly with respect to t=(0, 7’), where T is arbitrary. 


Proof. We have to show that the relation 
@+10o @+100 


J We, per ap= Yi x { cy(p)v(z,p)e" dpto(a-*) 


@—i 0c @—ioo 


holds for any integer V > 0 and for all t= (0, 7), where T is arbitrary. 


From condition a) we have 


F (2,p) = +o(a-"), 


which holds for any integer V > 0, and indeed holds uniformly with respect to p=M)>. 


Multiplying this relation by v(z, p)e?‘, t>0, and integrating along the imaginary axis 
Im p = @, lying to the right of Re p = wo, we obtain 


W+1ic @ +00 


4 
Woz, p)erap= -\\5 J ex(p)v(z, pe" dpto(a-*)u(z,t). (1.20) 


@—ic @—ico 


The required relation (1.19) follows from conditions b) and c) and relation (1.20). 
Application of the lemma gives us (1.18). Hence we have: 


Theorem 


Assume that 


a) B(x), @o(x), @i(x)and f(x) in (1.1)—(1.3) have the given asymptotic forms (1.6) as 


Dy _- oo* 


b) Re A,>0, pa(t) =O (e”*) as t > © (73% are positive constants), and there exists 
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00 


Lilw(t)]= f e* w(t) ae; 


c) the solution of problem (1.1)—(1.3) u(x, f) is such that, as t > ©, it has a bounded degree 
of growth 


O'Up 
sup | —— | = O(exp(Yint) ) 
Ox 


x>a 


and there exists &,[0'u, / dx'], where 1=0, 1, 2, 1<k<q, Yu are positive constants. 


Then, for any t©(0, 7’), T is arbitrary, the equation of the manifold of solutions, bounded 
as x > %, of problem (1.1)—(1.3) is 


a 
R(t) — ——u(z, t) +R(z, t)ku(z, t) +0 (z, t), 
4 


where A(x, t) and ['(z, t) have the asymptotic expansions 


oo co 


R(x, t) ~ p> sac ae er > 


m=t1 m=0 


and #,(t), An(t) and I, (¢) are defined, for 
Re p>@) = max{@,, Max max Yi} 
o<mli<3 igh 
by the integrals 


@+i00 


|| Bo (t) l].s== (2a) -! j llavo~! (p) Ilse" dp, 


@—tco 
@+i100 


| Bm (t) ll is= (20d) ~* f loo! (p) &m (p) | je?‘ dp, 


@—t oo 
@+i10 


[IPm(¢) l= (2ré)-* [ llow-*(p)Bm(p) Ihe” dp, 1< 
The matrices %m(p) and 8,, (p) are formally defined by the recurrence relations (1.13), 


(1.14). The series 
~) Rm (t) — Tn (t) 
ba ™ and Y* = 


m=0 


are asymptotically convergent as x > ©, for any fixed t= (0,7') ,where T is arbitrary. 


Proof. The Laplace transform of Y, (x, t)=R,(t) *du/ dxt+u — R«u—T, where u(x, t) 
is a solution of (1.1)—(1.3), exists for Re p > wo; performing the transformation, we get 
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Ov ; ; 
Wo(a, p) ao —— + v—nu—§, W, (2, p) =Yo(z, p), 








n(z, p) = = Se )e~\2) , E(x, p=) eee (p) 


ai 
m= 1 


The expressions 


. : Ov 
W (x, p)=a,.W, =—— av—8, 
OX 


represent the asymptotic solutions of (1.7), (1.8) and (1.11), (1.12) as x > ©; for them, we obtain 
from (1.7) the equation OW/dx+aW=0, containing the complex parameter p, Re p > wo. 


Let u(x, t) be a solution, bounded as x > ©, of the problem (1.1)—(1.3); then v(x, p) is also 
bounded as x > ©, for any p, Re p>@p». Then, using Lemma 1.1 of [1], for all p, Re p>@p, 
we have W(x, p) =a).W,=0. Since ag(p) never vanishes in the right-hand half-plane Re p > wo, 
we have W,(x, p)=0, and hence Y,(z, t)=0 as x > ©, for any t. Hence any solution, 
bounded as x > ©, of the problem (1.1)—(1 .3) appears in (1.21). Converely, if W(x, p) =0 for any 
xe=[a, o) forall p of the half-plane Re p > wo, then v(x, p) will satisfy the equation dv / dx 
=av+6, all the solutions of which, by Theorem 3.1 of [2], Chapter 13, and Theorem 2 of [3], 
Chapter 2, are bounded as x > & for any value of the complex parameter p, Re p>@o, It 
follows from this that all the solutions (1.21) are bounded as x > ©, and hence they are solutions, 
bounded as x > ©, of the problem (1.1)—(1.3) for all t= (0, 7), where T is arbitrary. The 


theorem is proved. 


2. Isolation of the solutions, bounded in the neighbourhood of 
a singular point 


1. Suppose that we wish to find the solution of Eq. (1.1), bounded in the interval [0, d], 
with initial data (1.2) and a linear boundary condition at x = d, when B(x) has a given asymptotic 


expansion as x > 0, e.g., of the form 


B(z)~—2+— ty B,2*. (2.1) 


k=0 


It follows from the results obtained in [4] that the condition for the solutions of the systems of 
ordinary differential equations (1.7) and (1.15), which have a regular singularity at the point 
x = 0, to be bounded for sufficiently small x is equivalent to the condition 


zdv / dx=avtB(z, p). 


Here, we have for a and 6 respectively: 


i 2 
2 + 5-2 + B(x)-Hi(p)=0, 
zx zx x 
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Cc 
a(x, p) ~ ) az" as x0, 
k=0 


where Qo is the root of the quadratic equation a +a >t+B_.=0. 


We shall choose the root for which the eigenvalues have a positive real part. Such a choice 
always exists, and is unique, provided that B_.<'‘/,E, where £ is the unit matrix. Further, 


Here, g-2=M (p)}-2+AoGo, _2ptAoGi, -2+A Wo, —2, Where /-2 and Qj, -2, jJ=0, 1, are the 
coefficients of x~2 in the expansion of the functions 9;(x), j=0, 4, and f, in the neighbourhood 
of zero, i.e., the following expansions are assumed to hold: 


Ok 


sa Fda 
etajen te p+ inc j=0, 1, 
x i 


h=0 


ee ae ee > Oe 
f(z) = a ime «as 


Employing similar arguments to those in [4] , we see that (2.3), (2.4), and (2.6), (2.7) have 
a unique solution for all values of the complex parameter p in the half-plane Re p > wo, where 
the solution can be expanded in powers of x when x is small. The matrices a and 6 are formally 
found from (2.3)—(2.5) and (2.6)—(2.8), while the series (2.5), (2.8) are asymptotically convergent 
for smallx > QOand any p, Re p>. After finding a and 8, it remains to perform an inverse 
Laplace transformation in (2.2). The resulting relation effectively replaces for sufficiently small x 
the condition that the solutions of Eqs. (1.1), (1.2), (2.1), (2.9), having a regular singularity at the 
point x = 0, be bounded at this point. 


2. Our method can be used to solve the boundary value problem 


0 “ 4 
a es PRON, ig OM Mids Oe iS 


Ot : Ox? x2 OT 
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u|:—.=0, 


du/dt+yu|.—-1=p(t), 


u(z,t)=O(1) as x0 forall ¢t>0. 
For Eqs. (2.10) and (2.14), relations (2.3), (2.4) take the form 


(2.14) 


a|x.o.70 for any p, Rep>apo, (2.15) 


where Wo is the growth exponent with respect to ¢ of the solution u(x, t) of problem (2.10)—(2.13). 
From (2.14) and (2.15) we find the following recurrence relations for finding the coefficients in 
the asymptotic expansion: 
img2) me 
dan(p)=— sate a ara, Q2(p)= 


_ 


© ? 


Re=i 


where 
ay if m iseven , 
En = : 
0, if m isodd. 


On successively finding a>, from (2.16), we discover that @2m(p) =(—1)™"**c,p”, where c,, 
are positive constants, with C41 <C,, for all m=1,2,... Henceforth, the following assumptions 
will be made regarding the solution of the boundary value problem (2.10)—(2.13): 


1) u(x, t) has derivatives up to the k-th order with respect to f, and in addition, 0°u / 0t*= 
O(1) as 2-0 ,foranyt>0,s=0,1,...,k; 


2) all these derivatives, regarded as functions of f, are function-originals. 
Under these assumptions, on taking the inverse Laplace transformation in the relation 


dv * 
= ~ m+i Mp~p.2M 2k+2 
=~ *) (—1)"**cmp™a'™+0 (x***) v (x, p) 


and noting that 
== F°u|1+940=0, s=1, 2, ecg k, 


| 1+0+0 


we find that 


f) * am 
a= Yh HL) en S40 (2), 
Zz m 


m=ti 
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If conditions 1) and 2) hold for all K=0,1,..., then 


(2.17) 


Since, in an alternating asymptotic series, the greatest accuracy is obtained when the series 
is broken off at the term which precedes the term of least absolute value, it follows from (2.17) 
that 


9p 1 fee 8), 


+m 


“ o™u . 
R,(z, t) -)) (—1)™* em er. 
— Ox™ 


11 Au 
[Ra(2,t) |< 7 ry z°=O(z"), 


since the absolute value of the error R» is not greater than the absolute value of the first of the 
discarded terms. It is thus sufficient to require that conditions 1) and 2) hold only for s = 0, 1, 2, 
since further assumptions about the smoothness of u(x, tf) with respect to ¢ do not improve the 
asymptotic convergence of the series (2.17). The behaviour of the manifold S,,..={u(z, t)= 
O(4), z<e, t>0} of solutions of (2.10), (2.13), bounded as x > 0, for any ¢ > 0, in the 

x, t plane is thus described by the first-order partial differential equation 


(2.18) 


up to an accuracy of €?. Notice that, from (2.18) asx > 0, we have (du / dx) |z-.=0, as might 
be expected [5]; in addition, the relation 
u(t,h)—u(t,0) hh du 


an» (3, 0), 
h a oe” 





which approximates the condition (Ou / Ax) |.->-=0 on the solution of (2.10) to order h?, is 
easily obtained from (2.18). On writing the equation of the manifold (2.18) of bounded solutions, 
at a point x9 =h/2, sufficiently close to the singular point, and noting that 


=OQO(h?), u(t, x.)—u(t,h)=O(h), 





du u(t, h) —u(t, 0) 
Se a h 


Ox ohne h 4 Ot 





(= ou u(t,h)—u(t,0)  h du(t,0) ) =o), 


which is in agreement with [5]. 


Note. The operator #2[0/8z] in Eq. (1.1) can have the more general form 
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e2 


fs] 
- + B(x) —+C(z), 
Ox? Ox 


where B(x) and C(x) are matrices of order g, which can simultaneously have irregular singularities 
of the Ist and 2nd kinds of integral order at infinity, and at the point x = 0, a regular singularity 
of the same integral order. In this case also, we can “move” in a similar way from the singular 
point x = 0 to a sufficiently close point xg, and from infinity to a sufficiently remote point Xoo. 
After “displacement” of the entire linear manifold of bounded solutions to these points, the 
equation has to be approximated to the same order of accuracy as that possessed by the difference 


scheme of initial equations. 


3. Comparison with solutions obtained by the method of straight lines 


The displacement of the condition for boundedness from a regular and an irregular point 
for Eqs. (1.1)—(1.3) to the corresponding points of numerical integration, can also be performed 
by the method of straight lines. In order to compare the first approach with the second, we shall 
apply the method of straight lines e.g., to the boundary value problem 

du d*u 


a poe t>0, re (0, oo), ul —o.=0, ulz-o=q (t), (3.1) 
Ot Ox” 


lu(z,t)|+O as z+ forany t>0. (3.2) 


After discretization with respect to the time variable in the finite interval [0, 7] in (3.1), we 
obtain a system of ordinary differential equations, containing the small parameter 7: 


yti_y” aut T 
== —-, where T=—, 
T dx* N 





with the condition that the solutions be bounded as x > °°. Introducing the vector Y= {u,,... 
, Un}, we rewrite (3.3) in the matrix form 


ad’Yy /dz*+BY=0, 


where B is a constant V-th order matrix of the form 





The unknown matrix a@, appearing in the equation of the manifold of solutions, stable as 
x > °°, of Eq. (3.4), which latter has an irregular singularity at infinity, is found from the relations 
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a'+o’?+B=0, 0 | x0 =A =— (—B)", 


It is easily shown that all the eigenvalues of the matrix a)= —(—B)” are equal to 
—(1/r)”; hence the coefficients in the asymptotic expansion of the matrix a in the neighbourhood 
of infinity, depend on the small parameter 7; this dependence is of the type a,=O(1~*”’). 


On refining the mesh with respect to ¢, the numbers a, become infinitely large. Hence the 
point Xoo, to which the boundedness condition is “displaced” from infinity, depends additionally 
on the step 7; if 7’ <7, then z.(t) <2. (t’). 


Condition (3.2) for Eq. (3.1) is thus approximated non-uniformly with respect to 7 by the 
method of straight lines. To a first approximation we have 


dY 


dx x=xX (tT) 


as—~ “FY | 3 (1): (3.5) 


In other words, . 
du 


“dr 


The approach described in Section 1 gives, for sufficiently large xoo, the following integral 
approximation, which takes account of all the preceding layers: 


—u(zq,t). 





f du(rx,5) do (3.6) 


Ox [x (t—o) ] be 


0 


Notice that, as x > ©, the equation of the manifold of bounded solutions is the same, to a 
first approximation, for equation (3.3) as for the equation w"*'=td’u"*' / dx*. Consequently, 
on displaying the boundedness condition from layer to layer for Eq. (3.3) by the method of 
straight lines, the second —w”/r on the left-hand side of (3.3) is “frozen” for sufficiently large x, 
with the result that relation (3.5) becomes diagonal, as distinct from (3.6); this indicates one of 
the qualitative differences between the two approaches. 


4. Application to a physical problem 


The propagation of stationary axisymmetric light beams in a cubic medium is modelled in the 
parabolic equation approximation by the following non-linear boundary value problem [6, 7]: 


ae: ef Oe : 
i—_ = — —- + ——+|ul*u, 0<z,r<oo, (4.1) 
9, ON, ee 


u|:-o=@(r), (4.2) 
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|u| +0 as roo for any z>0. 
It is easily shown by direct calculation that Eq. (4.1) has the integral of motion 


| lw|’r dr = const. (4.5) 


0 


In the light of (4.5), it is natural to assume from physical considerations that u(r, z), as a function of 
r, decreases more rapidly than 1/r at infinity. 


Hence the non-linear term in (4.1) decreases at infinity at the rate at least of 1/r3, so that it 
can be neglected for sufficiently large r. In view of this, the non-linear equation (4.1) can be replaced 
for large r by a linear equation, and the method described in Section 1 becomes applicable to the 
boundary value problem. On successively applying the procedure described in Section 1, we find 
after calculations that the equation of the manifold of solutions, bounded as r > °°, of the problem 
(4.1)-(4.4), is 


=—u/(r, 2) 





(2/n) ? E E) dé 
{+i Or (z—§)” 


4 (—1)*c,2*/ =| u(r, &) (z—&) “2”? dE—qa 


(1+i)" 
Q(h+2—3)/2 


ca —{)*** 4 Ons; 
heey (1+i)°#2-)  T'((k+2—j) /2) 





where I(k/2) is the gamma function, @;, j=0, 1,..., are the coefficients of the asymptotic 
expansion of the function ¢(r) close to infinity, anit! |e, “hey “le, /aes, .» .) 0 Cx; is the 
triangular matrix 

1/y CQ 

i 

hh ht | 


To a first approximation we have the relation 





(2/n)* ¢ Ou(ra,8) dE : 
1+i J or (2—E)* U(T.,2)— Po, (4.6) 


which correctly approximates the boundary condition (4.4) for Eq. (4.1). For the difference 
analogue of the boundary condition (4.6), we propose the following approximation with respect 
to Zz: 
f dU(T.., E) dg as ". du(Ta, §) d& 
Or (z—§)” or (z—£)" 


hk=0 2 











Au (ra. Zn41) 4 aeeeoe Ii ae 
(z—€)"* 
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n—i 
Oul(T.., Saas) Ou (re, Zr) 
-y| ra | ez) G1. 
or Or : 
k=0 


The proposed method was realized on the BESM-6 computer. As the initial approximation 
we took the Gaussian distribution @(r) =exp (—r’ / 21*), where / is the characteristic width of 
the initial pencil. For numerical integration of problem (4.1)—(4.4), Eq. (4.1) was linearized and 
approximated by an implicit two-layer second-order finite difference scheme with respect to both 
coordinates. In order to allow for the singularity of the behaviour of the field close to the axis, 
and to take a sufficiently large (with respect to r) interval of numerical integration, a non-uniform 
r mesh was used, with a specific law of variation of the integration step. Condition (4.4) was 
approximated by relation (4.6). To check the accuracy of the computations, the 
conservation of the energy integral (4.5) was utilized. The results obtained are in agreement with 
those obtained in [6]. 


In conclusion, the author sincerely thanks A. A. Abramov for suggesting the problem and 
for his assistance, and also S. A. Gabov for valuable comments. 


Translated by D. E. Brown 
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AN APPROXIMATE description of the non-diffusion term in the neutron flux expression is given, 
and is justified numerically using elementary examples. 


Introduction 


It is well known that, in practical applications of neutron transport theory, a detailed 
knowledge of the space behaviour of the neutron flux is required, as well as high accuracy in 
computing the integral characteristics of the system. There are methods (see e.g., [1] ) which provide 
highly accurate computation of integral characteristics such as the critical dimensions, yet do not 
yield information about the dependence of the flux on the space coordinate close to the interface 
(in the method of [1], use is only made of the fact that a jump is present in the asymptotic currents 
at the interface). The same feature is to be found in the zero approximation of the scheme 


developed in [3, 4] for solving multi-layer problems with the aid of Case’s method [2] . Generally 
speaking, the schemes for solving such problems with the aid of Case’s method (see [3, 4] , and 
also [5,6] ) could enable the integral characteristics as well as the flux behaviour close to the 
interface to be computed to high accuracy. But it is actually very laborious to obtain higher than 
the zero approximations for the scheme of solution e.g., of [3]. 


In this connection, it seems sensible to try to describe the non-diffusion term in the flux 
expression in an elementary (rational) way, such that, on the one hand, good accuracy in computing 
the integral characteristics is achieved, and on the other hand, transition effects close to an 
interface can be approximately described. The possibility of such an approach, whereby the solution 
of singular integral equations can be avoided, is mentioned in [7]. The realization of a similar 
approach is referred to in [8] for the case of non-centralized layers; here, direct use is made of the 
scheme of solution of the two-zone problem with the aid of Case’s method [2]. In the present 
paper we justify numerically a simple description of the non-diffusion term in the neutron flux 
expression, aimed at preserving the singularity in the behaviour of the solution close to an interface. 
Sections 1 and 2 deal with the plane and the spherical geometries respectively. In Section 3 we 
comment on the limits within which our proposed method may be used. 


1. Milne’s problem; critical dimension of the plate 


The numerical justification of our description of the non-diffusion component of the neutron 
flux in a plane geometry will first be given for the case of Milne’s problem for a non-absorbent 
medium (see e.g., [9] ). As a preliminary, notice that, in the plane geometry, the angular dependence 
of the neutron flux W (x, 4), where y is the cosine of the angle between the direction of neutron 
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movement and the positive x axis, has a discontinuity at the interface for y= 0. It is easily seen 


from simple geometry that 


Cy—C2 


W (Xo, —0)— WV (x, +0) = 5 WV (x), 


where cj, Cy are the numbers of secondary neutrons in collision to the right and left respectively 


of the interface x = Xg, and 


W (x) =f W (x, 1) dp 


—i 


is the neutron flux. 


We shall show that, for this reason, d‘Y(x)/dx is unbounded at x = xg. From Boltzmann’s 
equation one can obtain 


i 


W (z,+6, —p) —  (z,+6, 
=limf (x 1) (z,+6, p) F 
xo+6 


ae L 
60% L 





d 
lim — VY (z) : 
s+0 L dx 

after which, on taking account of the discontinuity of YW (2,1) for u=0, it is easily seen that 


the derivative dV (x)/dx for x=z» is unbounded. 


When describing approximately the non-diffusion component of the flux, we shall try to 
preserve the property just mentioned. For the flux in Milne’s problem, in accordance with [2] , we 


can write (x > 0) 


W (2, w) =H, +(a—p)+ f e-*/” H(v) O(v, u) dv, (1.1) 


where H, and (x — yu) are the eigenfunctions of the discrete part of the spectrum v# [—4, 4], 
describing the asymptotic (remote from the interface) behaviour of the neutron flux; 

cv 
9 


— 


1 
O(v, uw) = ;- ard A(v)5(v—p) 


are the eigenfunctions of the continuous part of the spectrum ve (—1, 1), the expansion with 
respect to which in fact describes the non-diffusion component of the flux (1.1);¢ = 1, 


5(v) is the delta function, and x is measured in free path lengths. 


If we solve strictly the problem in question, using the theory of singular integral equations 
(see e.g., [2] ), we can show that H(1) = 0. We shall seek H(v) approximately in the form 


H(v) =(4—v) Ae. (1.2) 


This form of H(v) ensures that dW(x)/dx is divergent at the interface x = 0. The approximation 
H, = 0 corresponds to an asymptotic consideration. The approximation (1.2) enables us to take 
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effective account in (1.1) of the non-diffusion term, describing the influence of the interface with 
the vacuum on the flux behaviour. To find H, and H, we use the Marshak boundary conditions [9] 


1 
fw" WO, w) du=0, (1.3) 


0 


where k = 0, 1. 


Substituting expression (1.1) in (1.3) and using (1.2), we obtain the following system of 
equations for H,, H): 


wH =f, 
where 


1 4 


oS 
‘ 19? hi Z a? 
23 2k+2 


2h+1 
4 








1 1 
W,2 = a Se 
(2k+2)(2k+3) 2 >. (2k+2—1) (241) (I+4) 
i=1 


err 
5 Lente (0), 


_ 


and we use the notation 
4 


a (x) = | v"(1—v) In (—+ 1) e~*/* dy, 
< 


The I, 460) required for our computations are equal to J,‘*? (0) =0.0871, I.‘*+? (0) =0.03004. 


Of the two unknown coefficients, H, means physically the distance at which the asymptotic 
density of the neutrons vanishes when it is extrapolated into the vacuum [9]. Our computed 
value H, = 0.71199 differs from the exact value Hy = 0.71045 [9] by 0.2% (relative deviation). 
Notice that, in the asymptotic approximation, we have H, = 0.66667 (relative deviation 6%), while 
in the P; approximation of the method of spherical harmonics, we have H, = 0.70692 (relative 
deviation 0.5%) [10]. 


In addition, we computed: 


1) the neutron flux at the vacuum interface, where the non-diffusion effect is a maximum: 


2 H, 
W (0) = W (2) leno = —| Hy+2 + (Ex(e)-Es(2)) | a 


Ix=0 


E, (x)= | ye dy, 


where the quantity J normalizes the expression (1.1) in such a way that 
0 
f uP (0, np) du=—4, 


4 
and is equal to J=0.5H,+0.(3)+0.5H2['/s—J2'~ (0) J, while 4#:=0.71199, H,.=—0.56957,,. 
so that J = 0.66666; 
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2) the angular distribution of the neutrons leaving the half-space: 


4 
caer TNE 
W (0, —lpl) T¥ (0) { | 


+ 0.54,[0.5+]u]—|pl)In(4+4/|pl) J}. 
The results obtained for ‘Y(0,—|]) are given in Table 1. 


TABLE | 





— 
= 


W,(0, —|p1) W (0, —|n!) Fy (0,—Ie I) (0, —[ nl) 





0.5000 0.5000 0.5000 0.4936 
0.5707 0.6287 0.6236 0.6207 
0.6414 0.7330 0.7252 0.7238 
0.7573 0.8313 0.8213 0.8207 
0.7828 0.9265 0.9146 0.9148 
0.8535 1.0199 1.0064 1.0070 
0.9242 1.4124 1.0974 1.0980 
0.9950 1.2035 1.1870 1.1880 
1.0657 1.2941 1.2764 1.2777 
1.1364 1.3845 1.3653 1.3668 
1.2071 1.4744 1.4539 1.4560 











0 
0 
0 
0. 
0 
0. 
0. 
0. 
0. 
0. 
1 


CDOBDIPUpwWHE 


Let us next indicate how the quantity VY (0, —| | ) just determined can be improved in 
accuracy. It follows from simple geometry that, for a non-absorbent half-space we have 


0.5 f e~*yp(xz—Ep) dé, u>0, 
0 


W (2, )= 3 


0.5 [ e*p(2—€n) dk, u<0. 


0 





If we substitute the W(x) defined above in this relation, we can obtain for (0, —|p|) the 
expression 
(0, —Ipl) 
1 


4 
=— {21+1l+0.5H,] 05+ Iwl—Iul (+ ipl) in (1 + 
T¥ (0) ) 


where H,, H>, J are given above. The quantity W (0) =1.731 deviates from the exact value 

W (0) =1.732 (see [11]) by 0.06%, whereas YW (0)=1.709 has a relative deviation of 1.3%. 
The values of the function (0, —|p|) are also quoted in Table 1. The function Y.(0, —|p]), 
also in Table 1, corresponds to the asymptotic approximation (H = 0), while Y,(0,—|p|) is 
computed to high accuracy in [12]. 


To sum up, the use of the approximation (1.2) has provided satisfactory computational 


accuracy both for the integral characteristic H, , and for the behaviour of the neutron flux close to 
the interface. 
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Now consider the problem of finding the critical half-thickness of a plate of neutron multiplying 
material. Using the method described in [2] and symmetry arguments (the interface x = + a), we 
can write for the neutron flux: 


v1 cos (z/v,) + psin(x/v,) 


+ ‘ Fx/|pl 
re H,(|pl)as(uye 


W (2, pw) = A, 








i 
Ci e7z/% e*/" 
$2 vA, (v ( + ) dy, 
2 J i) v—-y ovtU { 


i 4+ 
m(u)=1—-F°In (5) 5 
y) 1—w 


a= 


the upper sign refers to u > 0, and the lower to u <0. 


We shall seek H,(v) as 


H,(v)=(1—|v|) Hem". (1.5) 


It can be shown that the exponential factor also appears when the problem is considered strictly. 
Expression (1.5) provides divergence of d'¥(x)/dx for x=-ta. We find the constants H, and 


H, by using Marshak’s conditions [9] : 
—1 
| w+" W (a, u)du=0, (1.6) 


0 


where k = 0, 1. 


After substituting expressions (1.4) in (1.6).and using (1.5), we obtain the following system of 
homogeneous equations in H,, H): 


wH=0, (1.7) 


Vi a 
W.,= oe Tih (1+,7-?) cos — — 
2 v4 
v4 a G—-%]. a 
We, = =a ii~v:' In(1+v.7?) Jeos — — [ve sin —, 
vy C; v1 


8, (20)— 8, (0) — 1 (2a) — 12 (+ — 8,0), 





oy eNO 2 “1 &,(0) 
» ete i 


= = l=1 
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ee (+) (+) 
+ &,(0)— 1!" (2a) — I 0) |, 
Cy 


8 (2)= [ v"(L-v) "dv Egya(2) — Buys (2), 


0 


The critical half-thickness of the plate a is the least positive root of the equation 








ey oe CiV4 In(1+v,7?) = [In (1+v,-?) : xOE c,/3(c,—4) 
nl 2(c,—1) W22/Wy,+v,2—C,/3(C,—1) \. 


Computational results from this last expression are given in Table 2, where they are compared with 
the results obtained in the S; ¢ approximation by Carlson’s method, and in the V approximation 
of the variational method (see [13]). The values of a) correspond to the asymptotic approximation, 
and in the parentheses we quote the relative deviations from the values ay (in %), which represent 
the most accurate values. The quantities vy, ~! were borrowed from [1], where they were computed 
for a large set of c, values. 


TABLE 2 











3.3344 (1.03) 2.1388 (1.20) 1.3014 (0.94) 0.7322 (0.60) 
3.2989 (0.04) 2.1124 (0.05) 1.2896 (0.02) 0.7394 (0.40) 
3.3002 2.1134 1.2893 0.7366 

3.3023 (0-06) 2.1146 (0.05) 1.2902 (0.06) 0.7372 (0.10) 














2. Critical size of the sphere 


We shall first consider the problem of finding the critical radius of a sphere of homogeneous, 
neutron multiplying material. We know from [1, 9] that the Peierls’ integral equation can in this 
case be reduced to an integral equation, formally identical with the equation for the plane geometry, 
which corresponds to an integro-differential equation, of the same form as Boltzmann’s equation 
in the x geometry. The solution of this equation, i.e., the pseudo-distribution V(r, u), is connected 
with the true flux V(r) by the relation 


rw (r)= |W (r,u)du 


a4 
(here, w=[—1, 1] is anon-physical parameter). 


On the outer surface, (7, u) has to satisfy the condition 


¥ (R, u<0) =0, 
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and moreover, 
W (r, w) =—¥ (—1, —u). 
Here, R is the required radius of the sphere, expressed in free path lengths. 
The method described in [2] can be used to find W(r, u); hence, using (2.2), we write 


a) Vi sin(7r/v,) — u cos (r/v,) 


Wi (7, u) = A, + H,(\ul)A,(w) e*"""™! 





Vv, tu’ 


4 
. C, e77/* er’ 
+ Jen) ( v—U ie vtu ) av, 
0 


(2.3) 





where the upper sign refers to u > 0, and the lower to u <0, and the lower to u <0. If we solve 
problem (2.1) strictly for (2.3), we can show that H,(1)=0 and H,(|u|)~e-*/'“'.We shall seek 
H,(v) in the form 


H,(\vl)=(1—Ivl) e-®”"' Ae (2.4) 
This approximation implies that dW(r)/dr is divergent for r = R. The latter property also follows 


from the transport equation in spherical geometry. From it we can obtain directly, for the 
homogeneous sphere, 





d > WY (R+6, —n) —  (R+6, 
lim| = lim f lh ( H) du— 


=, YV (r) 


T 60 a 


0 





1 ¢ (1-p’) a 
- "Ww (R+6, w) +— ¥ (R+8, — \e \ 
R+6 J u i ONT at H) | dn 


0 


(u is the cosine of the angle between the neutron velocity vector and the radius vector R). Simple 
geometry shows that, if W(R, —0)=W(R, +0), then the derivative OY (R, w)/Ou hasa 
discontinuity for u = 0 (see also the computations in [14]). Hence it follows at once that 
d¥(r)/dr is unbounded on the interface with the vacuum. 


The unknown coefficients in (2.3) may be found by replacing the exact condition (2.1) by 
approximate conditions, representing an analogue of Marshak’s conditions: 
= 


j n+. (R, uw) du=0, (2.5) 


0 


where k = 0. 1. 


Substituting Eq. (2.3) into Eq. (2.5) and using Eq. (2.4), we obtain a system of homogeneous 


equations of the type (1.7), where 
c,—1 


Vv R 
Wu. = —In(1+v,7?) sin —+ 
2 V1 C; 
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R 
cos —, 
Vi 


; “ike Pigs 
Wa, = * [1-2 In(1+v,7”) ]sin — + | ‘f,—v,2 (cvs) | 
- Vi Cy 


* a 
6 (2K) + &,() +1" (0-15? 2R)-—8 (0) 








4—l 


by (—1)'-" &, (2R) — &, (0) 


ie 2 
-1/? (2B) —84(0) 


and all the notation is the same as in the previous section. 


The condition for solvability of system (2.5) is 


R 2(c,—1) VP FW 2/Wy2—C,/3 (C,—1) 


te— = — 
: vs c,v, In(1+v,~’) V2 +W.22/W,2—[In(14+v,-) ]- 





? 


and the required radius R is the least positive root of this equation. Notice that, just as in the 
computations of a in Table 2, we have neglected the dependence of R on the matrix elements w1 
and w, when evaluating R; this considerably simplifies the working. The results are quoted in 
Table 3, where they are compared with the exact values Ry of [13], and also with the values Rs 1¢, 
computed in the S;¢ approximation (see [13] ). The values Rf 7 in Table 3 (see [13) refer to the 


TABLE 3 








1.05 = 1.40 





3.1839 (0.40) 1 

764 (0.01) 3.1719 (0.001) 4 
2772 3.1720 L. 
1 

it 


.9794 (0.30) 
.9866 (0.07) 
9854 
) 


0 (0.14) 


D 
83 


9 
9853 


3.1690 (0.10) 
3.1720 








improved diffusion method [1], while the Rp values were computed in the asymptotic approximation 
(H,) = 0). We quote in parentheses the relative deviations (in %) from Rr. 


The Rp values were computed from the asymptotic formula 
2(c,—1) 
c,v, In (4+v,7*) 





Now consider the problem of finding the critical radius of a sphere, surrounded by an 
infinite reflector. Given a constant free path length, the problem can again be reduced in this case 
(see [1, 9] ) formally to the plane case. As before, the pseudo-distribution V(r, u) is antisymmetric 
(see (2.2)), it must be continuous at r = R, and it must vanish at infinity. ForO <r<R, it is 
described by the expression (2.3). For R <r < ©, using Case’s method, we can write 
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i 
er c e€ 
Y.(7,a)— "Seas + = vi, (v) dv 


Vo2—U V—U 
0 


+ 0(u) H,(|ul)A.(u) en", 
0(u)= { 


—r/v 


i. u>0, a7) 


0, u<0. 


On considering the problem strictly, just as for example in [3], we can show that H,(|u|) ~e 
—W/lul HH. ( |u|) ~*/'"! and (1) = H>(1) = 0. For an approximate description of H,(|v|) we 
shall use (2.4), while we write H, (|v |) as 

(2) 


A,(|vl)=(1-Ivl)e”™ A (2.4') 


This form of H>(| v |) ensures that dW (r)/dr is divergent when approaching r = R from the right. 


TABLE 4 











530 (1.04) 1.0357 (2.00) 


3.2943 (0.08) 6 
.6730 (0.03) 1.0706 (0.01) 
6 
6 


1 
3.2902 (0.06) 1 
3.2923 1 
3.2940 (0.06) 1 


724 1.0705 
721 (0.02) 1.0697 (0.08) 











We replace the exact continuity condition on the interface by the approximate condition 


+1 +% 
few, (R,u)du = [uw ,(R,u)du, (2.8) 


0 0 


where k = 0, 1. The choice (2.8) of the boundary conditions ensures that the asymptotic flux 
jumps at the interface, this being one of its known properties (see [1,9] ). After substituting (2.3) 
and (2.7) and using (2.4), (2.4’) in (2.8), we obtain a system of homogeneous equations in H nis ‘ 
H 7 ; ase Hs. The least positive root of its determinant is in fact the required critical radius 
R of the sphere with the infinite reflector. The computational results for R are given in Table 4; 
C2 = 0.99. The values Ro refer to the asymptotic approximation (H{) = HH, =0); Rr 
is the value obtained in [6] for a large number of iterations in the strict scheme of solutions, using 
Case’s method; Rp ry is the value computed by us from the improved diffusion theory of [1] ; the 
relative deviations (in %) from Ry are quoted in parentheses. 


The quantity Rp was computed from the expression 


R, (c,—1) C2V2 In(1—v,~) 


t a 2) 
. Vi CyVi In(1+v,7?) (1—c,) 
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When computing R, we neglected the dependence on R in expressions of the type 


—2R/v 


Wines -<={ fu du | fv(1—v) ée 
i dae) 2 
0 


“a 
0 





+ = (4—lulye-nem A, (u) |}. 


Cy 


Where more detailed computations are required, the functions /[*)(x), &423(x) can easily be 
tabulated. 


Let us now indicate some features of our approach in the case when the outer layers are finite, 
taking as our basis some of the conclusions of [8]. For clarity, we consider the two-zone reactor. 
Assume, as before, that the inner sphere consists of neutron multiplying material. The neutron 
pseudo-distribution in it will be described by expression (2.3) in the approximation (2.4). We can 
write in the shell (r > 0), using Case’s method and symmetry considerations, 





Vstrsu) = oie Sha) web (o/vy) 
2X\"5 — #£2 ; 


+ H 





(2) V2 ch(r/v.) + ush(r/v2) 
; y.2—u? 


—r/v 


/v (2) er! 
AEE og dy 
u vtu 


+[0(w) Hs” (Jul) — 0(—u) H,” (Jul) Jas (w) en". 


It can be shown in the same way as in [8] by a strict consideration of the problem that, in 
view of the continuity conditions on the interface, and the absence of W2(R.,u) for u<0 we 
have H,*”’ (1)=0, Hy” (\ul) ~ e®™!, H®)(|u|)~ e-"’'™!, where R is the radius of the central 
sphere in free path lengths, and Rg is the shell radius in the same units. These facts imply that the 
derivative d¥(r)/dr has a singularity both for r = R and for r= Rg. Hence we can seek H (1?) (|v|) 


as 


Hy? (Wwl)=(1-Ivl) Hs? e®™!, HS? (lvl) =(1— Iw) He? eo, 


The exact boundary conditions, for continuity of W (r, u) atr =R, and for ¥2(ARo, u<0)=0 are 
replaced by the conditions 


—{ 
J W, (Ry, u) w+! du=0, 


+1 = 
J W,(R, w) w+! du= | W,(R, w)u** du, 
0 0 
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; et HY? yW® ‘ 
where k = 0, 1. This system of equations in fact enables us to find “2 ,4425,, while the 


condition for solvability of the system provides the equation for finding the critical dimension of 
the system. 


3. The range of application. Conclusions 











FIG. 1. 


Curves: 1— for c=0.6, 2- for 
c=0.8, 3-— for c=0.9, 4— for c 
=0,99, 5— for c=1.2, 6— for ¢ 


=1.4, 7 — for c=16 


TABLE 5 








4.4128 (0.73) 4.3751 (0.13) 
2.1105 (0.25) 2.1020 (0.15) 
1.2617 (1.08) 1.2740 (0.12) 
4.7352 (0.84) 4.6886 (0.15) 
2.3082 (0.53) 2.2922 (0.17) 
1.3798 (0.61) 1.3862 (0.15) 
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Our description of the non-diffusion term in the neutron flux expression primarily utilizes 
the fact that the coefficient of the eigenfunction expansion of the continuous part of the spectrum 
v in Case’s method has a zero at | v | = 1. The reason for the vanishing of this function at | vy | = 1, 
both when considering Milne’s problem, and when considering critical problems, is that the following 
condition proves to hold: 


=g(c, v). 


In Fig. 1 we show curves of g(c, v) for several values of c. It can be seen that, as c decreases 
(the absorption without neutron reproduction increases) a maximum occurs in g(c, v) as a function 
of v, i.e., the dependence becomes markedly non-linear. It is useful in this connection to set a 
limit to the range in which our method is applicable, dependent on the absorption in the shell. It 
turns out that, e.g., when computing the critical dimensions of spheres with a reflector, the 
approximation (2.4), (2.4’) has a wide range of application. In Table 5 we give the critical radii R 
of the sphere with infinite reflector when there is substantial absorption in the latter, computed 
by the method described in Section 2. The Rp values were found from (2.9). The Ry values are 
borrowed from [15], where they are obtained for a large number of iterations in the strict scheme 
of solution using Case’s method [2] . The relative deviations (in %) from Ry are quoted in 


parentheses. 


In short, our description of the non-diffusion component of the neutron flux enables the 
critical dimension to be computed with satisfactory accuracy. The accuracy is no worse than in 
the S;¢ approximation of Carlson’s method or the improved diffusion method [1]. As compared 
with the former, it has the advantage of being less laborious, and as compared with the latter, it 
also enables the behaviour of the neutron flux close to the interface to be described. For instance, 
for the homogeneous sphere, the neutron flux is equal to 


WV (r) =Wass(r) + Vir (r), 


where 


Hq” sin (r/v;) p (r) = yi? EolRtr) ai &,(R—r) 


Cis ? r 





: ee : (1), py (A) 
the expression for @,(2) is given above, while H, /H, =—wWy/Wy, Wy, Wy. are given 


in Section 2, and R is the least positive root of Eq. (2.6). 


Notice that | ¥,, (7) | takes its maximum value on the interface with the vacuum r= R, and 
W..(0) =0. If it is necessary to refine the angle-space flux, the scheme described in Section 1 
may be used; a simple illustrative example showed that this scheme is quite efficient. 


Translated by D. E. Brown 
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NON-LINEAR MATHEMATICAL PROBLEMS OF 
THE TRANSMISSION OF EXCITATORY AND INHIBITORY PULSES 
IN NERVE TISSUE* 


S. F. MOROZOV and I. P. SMIRNOV 
Gor’ kii 
(Received 30 June 1975) 
FOR A non-linear integro-differential system of equations of the transmission of excitatory and 


inhibitory pulses in nerve tissues, in the standard and non-standard cases, existence and uniqueness 
theorems of the solution, prior estimates and some quantities of the solution are established. 


Consideration of the single-velocity equations of the transmission of excitatory (,) and 
inhibitory (W_) pulses in nerve tissues (see [1] ) leads to a study of a non-linear integro-differential 
system of equations of the following form: 


= tp. (8, P, t) +(s, V) ap. (8, P, t) +2" (8, P) ps (3, P, 


=f W,.(s-s’)2*(s’, P)p.(s’, P, t) ds’ +Q* (s, P) Fs (pa, p-), 


£2 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 149-161, 1977. 





The transmission of excitatory and inhibitory pulses 


0 (cont'd) 
p_(s, P, t)+(s, V)p_(s, P, t) + 2 (s, P) p_(s, P, t) 
a] 


= { W_(s-8')E-(s’, P)y_(s’, P, £) ds’ +0-(s, P)F_(pas ¥-), 


with the initial and boundary conditions 


p.(s, P, t)=q.'(s, P, t), Pel, (n(P), s)\<0, te[0, T], 


pa(s, P, t) lino = po” (s, P). 


Here P={x,, 2X2, X3}—G is a convex domain in E> with the smooth boundary I, n(P) is the 
outward normal at the point PET’, s={s,, s2, s3}=Q isa unit sphere in E?, t=[0, 7], s-s’ 
$8; +8282’ +5385". 


The non-linear operators F'..(\p,, *p—) are considered in the MacCullagh—Pitts approximation 


(section 1). 


In the present paper we study successively the stationary problem (section 2), for which 
existence and uniqueness theorems of the solution are established, prior estimates in terms of the 
data of the problem, and also some properties of the solution, following from additional conditions 
on the coefficients, and the non-stationary problem (section 3), for which theorems of the 
existence and uniqueness (and stability in 1) of the solution are established. Section 3 is devoted 
to the definition of the fundamental spaces and operators of the problem and to an investigation 


of their properties. 


1. Fundamental definitions 


1. The coefficients. We assume that the functions ¥*(s, P), Q*(s, P) are measurable with 
respect to the ensemble of variables for (s, P) = XG and satisfy almost everywhere in 
the following constraints: 


0<o*<>*(s, P)<2*#<0~, o*, &*=const, 


|Q*(s, P) |<Q*, Q*=const. 


The functions W..(s-s’) are integrable on 2X2’; §&..(s, P), nu (s, P), wi(P, P’), s=Q, P, 
P’eG, are coefficients occurring in the construction of the non-linear operators F’., (1p., p-) 
(see below), and measurable with respect to the ensemble of variables in the corresponding domains 


of definition. 


2. The fundamental spaces. In what follows we use the Banach spaces #,* of functions 
W(s, P) measurable on (XG _ with norms 


ls | J » Ye (s,P)ip(s,P) "ds dP] 


"XG 
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and their product 4,=#6,*X#, with norm ||| p=[ (Ipsll,*)? +(Ilp_ll,-) 7)”, p=col 
{p., p-}, 1<p<. The Banach spaces of functions @(s, P), (s, P)=QXG, possessing the 
generalized derivative (s, V)@= H,*, with norms ||@]lw,*=Il@l|,*+Il(s, V)q@ll,* will be 
denoted by ¥ We denote by p> the subspaces of functions p(s, P)=W,°, satisfying the 
condition y(s, P)=0 for P=I_., where I_, is the part of the boundary I’, on which(s, n 
(P))<0.Let W,=W,*XW,-, D,>=D,* XD,, D,, and W,, be everywhere dense in 


FE, [2]. 
3. The operators. We defineon #, the linear operator S=diag {S,, S-}, where 
S.rp.=[B*(s, P)\-"f W..(s-s’)3*(s’, P)tp.(s’, P)ds’, p.<J6,*. 


The operators generated on W,*, D,* by the linear differential expressions /..9.=[2*(s, P)]~'! 
(s, V)git@., will be denotedby 4. and L+ respectively. On W,, and D,, respectively we 
introduce the operators Y= -diag {%,, Y_}, L=diag {L., L_}. Finally we define the 
non-linear operators F'.(1p,, —). For this purpose we introduce the linear operators R+: Fé »—> 


+~[,(G), 1<qso: 
Ry={ [8.(s’,P’) py (s’,P’)— nals’, P)Y-(8', PY] 


QXG 


X p.(P, P’)ds’ dP’, p=col {p,, p_}. 


Let F (x) =(uotx)0(ut+x), Uo=const, r=(—~, ©), O(x)={1 for z>0; 0 for x<O}. 
We define on La (G) the non-linear superposition operator F by the equation 


Fy=F (p(P)), —p(P)EL,(G),  1<g<~. 
Then Fs. (1p, p-) =F Rap, p=col {rp,, p-}SH,. 
4. The properties of the operators. The following lemmas hold. 


Lemma | (see [2, 3]). 


The operators Y: W,>H,, L: D,->H,, are bounded. The operator L~': 


exists and 
|Z~*||p<max {1—exp(—X*d); 1—exp (—2~d)}, 


d=diam G<o, 
Here and below ||A||,=|[Allx,, for A: #,7>H)>. 


Lemma 2 (see [2] ). 


The operator S: H,—~H, is bounded and the following estimate holds: 


ISllpS < max fj |W... (y) |dy; f |W_(y) \dy} ; 
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Lemma 3 (see [2, 3] ). 
The operator L~'S: H,—>#, is completely continuous. 
Lemma 4. 
Almost everywhere in G let 


1/7. 


[ J ete P’)p.(P,P’) |" ds’ aP’ | “<C,4, 


1/@, 


| J inate? P’) wa (P, P’) |*2 ds’ aP’ | "<C,4 


and almost everywhere in (XG _ let 


4 [= (s,P) pw. (P’, P) |% dP’ | 


G 





{ Ina (s,P) w2(P’, P) |" dP’ | 





G 


<p, (p—o.z)/(p—1)<rs, (P— 64)/ 


where C,*,..., C,* are constant numbers, o., 6.< 
(G) are completely continuous and the following 


(p—1)<@z. Then the operators R.: FE, > Ly 
estimates hold 


e +) i—o,/p (C,*) 0,/p (An V) i/p'’—1/ryto./prs 


/p' ee i . 
| Rl] 4#)+2,(¢) S 2” P max { (ot) */? ? 





(o-) 





(€,*) pO (E > * ./p (4x) ‘) i/p'—1/0,+5,/po, | 


where p’=p/(p—1), V=mes G. 


Lemma 5. Let conditions (4), (5) be satisfied for some rz, w.>1/p’. Then the operators R.: 
H,—L..(G) are continuous. 


Proof of Lemmas 4, 5. We represent the operators R..: 44,>L,(G), 1<q <© in the 
form Rip=RL?y,—RL y_, p=col {ip,, p_-}, where 


f(s’, Pm. (P,P’)p.(s", P’) ds” aP’, 
QXG 
_= f nels’) pe (P, P’) p(s", P’) ds’ aP” 


QXG 


It follows from [4] that when the conditions of Lemma 4 are satisfied the operators R{’: :76,* 
+L,(G), R®): %,-—>L,(G) are completely continuous. But then Ri: H,> L, (G) are 
also completely continuous. Moreover, the conditions of Lemma 5 ensure the continuity of the 
operators Ri: H,*>L.(G), RY: H,->L.(G), and therefore also the continuity of the 


operators 2, : 46>~L.(G). Estimates for the norms |R.ll¥,+rp(c) follow from the estimate 





S. F. Morozov and I. P. Smirnov 


9) 
I-,(axe Ly (G) | Rs IL, (@XG)-+Lp (G) | 


DUP’ bap |! 
(ory? : es a 





and estimates for the norms ||R{” ||:,,axc)+zpic), are given in [4]. 


Lemma 6. 


The superposition operator F acts from L,(G) into L(G) for 1<q<o, and is continuous 
and bounded on every sphere in L(G) for 1<q<s, The operators FR+ are completely 
continuous from A’, into L,(G) when the conditions of Lemma 4 are satisfied, 1<p<o, 


The proof of the lemma follows from the continuity of the function ¥ (x) (see subsection 
3) and the estimates |F¥ (x) |<|w|+|z|], x=(—%, ) (see [5], p. 312). 


2. The stationary problem 


1. Statement of the problem. We consider the stationary problem for the integro-differential 
system 


(s, V) ap, (s, P) + 3+ (s, P) ap, (s, P) 
= | W,(s-s")3*(s’, Pap. (s’, P) ds’ +Q*(s, PF. (4s -), 


Q 


(s, V) p_(s, P) + 2-(s, P) p_(s, P) 
=J W_(s-s’) E-(s", P) p(s’, P) ds’ +Q- (s, P) F- (1px, p-) 


with the boundary condition 
Ps (s, P) =. (5, P), Per_.,. (7) 


In what follows we will consider only those boundary conditions '(s, P) which permit 
continuation onto W,,, that is,a ¢ =W,, is found such that y= y for P=T_ (on the subject of 
continuation see [6] ). A generalized L,-solution of the system (6), (7) is defined as a function 
p(s, P) =W,, for which 


Lp=Spt+By, — y—ED>. 


Q*(s,P) Q-(s, P) 
(s 


B’y=col {Seay Ra, FR}. 


=*(s, P) =~ (s, P) 


For the purpose of studying system (6), (7) we introduce the following operator equation in the 
space Fé): 


p=L"Syt+L'Byt+f=Ay, 
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where Byp=B’' (pr), f=L-'Sp—L~'Lq. The following lemma holds. 


Lemma 7. 


Let x=D, be the solution of (8), then p=y+q_ is the L solution of the system (6), 
(7). Conversely, if y is the L,-solution of (6), (7), then X= =w—@ satisfies (8). 


Therefore, the question of the existence and uniqueness of the L solution of system (6), (7) 
reduces to the question of the existence and uniqueness of the fixed point of the operator A: H, 


+H, in H>. 
2. Lemma 8. 


The operator 4,-%, satisfies the Lipschitz condition || At,—A tpo|| p<a@y||pi—-rpol! p, 
Di2=6,, where a,=||Z~-'S||,+||Z-'||,£,, 


t p al Pp 
hy? =| 3* es WR Me on, oy +E (=) RM, 2 | az. 


Proof. It follows from the form of the function ¥ (zx) (subsection 3), that F (x,) -—F 
(x2) |< |z,.—2 , X12 (—o, 0), Then 


*(s, P) , PP a es 
| Bips—Bypalle? = { SPR g (0.9) -FRe (+9) \ 


Pp 


+{ CSTR (b+) -FR (+0) il] } 
<=41D* (=) |B. (tp;—rpe) le cay 


+4n>- (= ) |R_ (pi—h2) I 4 (G) Sky? || Pi — Pall p” 


and therefore 
| Atp,—A P| rS|| L~S (i—p2) llp+ |Z-' (By,—Br2) ll, 


<Apl|p.— Yall». 
Theorem 1. 
Let 
e;<1, 
then Eq. (8) (system (6), (7)) has a unique solution. 


The proof follows from the principle of compressible mappings [5]. Condition (9) can be 
checked numerically by using the estimates given in Lemmas 1+4. 
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3. We note that when the conditions of Lemma 4 are satisfied the non-linear operator 
A: Fé,.>€, is completely continuous. We explain the conditions for which Schauder’s principle 
holds for A [5]. Let ||p+@ll,<r, then 


|Aptq@ll<IL-S|l pr +L~'lllBylle+le-L-"Zll>. 
Or 


IByl-r<and* (—) WRGIE, 2200) 


+4n2- (=)’ R—llse, +10) id 


+f luol? f | 3*(s,P) | oon de 


2XG 


EN Q-(s,P) 1” aa ; 
‘nguit sey |asap b= x, r+ {o,P}, 


therefore ||Apt+q||p><|/L~'S||,r+llo—L-'Lell,>+L-' lp {kpPr?+c,?} /? =p. Therefore, if 
b,<r, then the operator A maps the sphere ||p+q@||p<r_ into itself and Theorem 2 holds. 


Theorem 2. 
Let an r>O, be found such that 


b,Sr, 


‘ 


then Eq. (8) has at least one solution in the sphere ||p+||,<r. 


However, it is easy to show that conditions (9) and (10) are equivalent: (10) implies (9) and 
conversely, when (9) is satisfied an. r,>0, can be found such that (10) is satisfied for all r>ryp. 
Nevertheless Theorem 2 permits us to obtain for the L,-solution of (6), (7) an a priori estimate in 
terms of the data of the problem, namely: for ap<1 


II*pll a<rp, 





{ llo—L"Zally | 


& 


TeX inf max 
O<e<(1-IL“*SI] , /IL“H A, 


{—IIL“'Sllp \ -t/p 
Real ae Yee a | 
: L( IZ—"llp :) 


4. For practical estimates it is interesting to know beforehand, in terms of the data of the 
problem, the behaviour relative to each other of the components of the solution p=col {., p-} 
and also the conditions ensuring the existence of a solution non-negative almost everywhere in 


QXG. 
Let KH, beacone of the form 
K={zx=col {r,, r-}=H,: x.(s, P)>x_(s, P) SO almost everywhere in 
QXG}. 
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We introduce the linear operator R: H,—>#,, as follows: 


Q* (s, P) Q-(s, P) 
EY 2 A ae 
z=col *(.P) Riz 3-6, P) z LEH » 
Lemma 9. 


Let the conditions of Lemma 4 hold. Then the operator R: #6,->, is completely 


continuous and ||R!|,<k,. 


We suppose that almost everywhere in the corresponding domains of definition the following 


conditions are satisfied: 


E,(s, P) =&_(s, P)=n-(s, P)=y+(s, P) SO, 
pi (P, P’)2p-(P, P’) 20, 


é 
W,,.(s:s’) &* (s, P—Es) exp [- J x*(s, P—E’s) ax’ | 


0 


3 
>W_ (s-s’) 27 (s, P—§s) exp ae } > (s, Pt's)de’| >0, 


§ 
Q* (s, P—Es) exp [-J xt (s, P—E’s) ax | 
0 
>0-(s, P—ts)exp — [ B-(s, P—t’s) ax | >0, 0<t<d. 
0 
Lemma 10. 
Let K, be the cone of non-negative functions in L(G). Then when (11) is satisfied, r=K 
implies R.z=K,. 


Lemma 11. 


Let conditions (11) hold and @, f@K. Then the operator A is positive on K. 


The proof follows from the monotonicity of the function ¥ (x) and the representations 
for the operators L.~'S., L.~', given in [2]. 
Lemma 12. 


Let u,>0 (see section 1, subsection 3) and the conditions of Lemma 11 be satisfied. Then 
the operator A has a strong asymptotic derivative A’(°°) with respect to the cone K [5] and 


A’() =L'S+L-'R. 
Proof. Let x&K, then 


| Ax—A’(%) 2||,><IIfllp+IL~ ‘ll p|B2—Relp. 
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Since R..(x+q)=K, for xz, p=K (Lemma 10)) and consequently the functions R. (P)=R,. 
(z+@) are non-negative almost everywhere inG, then ¥(R.i(P))= Rs (P)+u, almost 

everywhere in G. Therefore 

Q* (s, P) 
E+ (s, P) 


- Pp 
(R,.@tuo) 
Pp 


|Br—Rz||,p? = | 


+ ([ s, P) 


(s, P) — of E: 


Therefore (12) implies that 
||Az—A*(~)zllp _ 





lim = sup 
R+co [|x|] eR, xk lz\lp 


which is what is was required to prove. 
Theorem 3 (see [5], p. 419). 


Let A be a completely continuous positive operator, A’() its strong asymptotic derivative. 
If the spectral radius of the operator A'(°°) is less than 1, then the operator A has in K at least one 


fixed point. 


Lemmas 9—12 and Theorem 3, which we have proved, enable us to make the following 
statement. 


Theorem 4. 
Let the conditions of Lemmas 4, 12 and 


R = lim [|| (Z-'S+L-'*R)*||p]*/"<14. 


nln-> CO 


be satisfied. Then Eq. (8) has at least one solution in K. 


Corollary 1. We note (see Lemma 9), that R<|/L-'S+L-'R|| p< |]L7*S || p+ ||L~*|| pkp=ap. 
Consequently , (13) holds if (9) is satisfied. Therefore subject to conditions (9) and (11) the unique 
L,, solution of the system (6),(7) =col {p,, p_} possesses the property p(s, P) >p_(s, 
P)>0 almost everywhere in QXG. 


Crollary 2. If in (11) the signs > in the right sides of the inequalities are replaced by the 
sign =, then if a,<1 for a unique L,,-solution of system (6), (7) we have tp, (s, P) =_(s, P) 20 
almost everywhere in QXG. 


3. The non-stationary problem 
1. Statement of the problem. In this subsection we study the non-stationary problem (1)—(3). 


We assume that the boundary conditions y'(s, P, t) permit continuation onto W,,, that is, for all 
t=[0, T],, a p(t)=W,, can be found such that @(t)=q’(t) for P<T_.,, 1€[0, rT. 
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We define a generalized L,,-solution of problem (1)—(3) asa mapping p(t) : [0, T]>W,, 
possessingin 36, for all t=[0Q, 7] continuous strong derivative dw/dt, for which 


dyldt+Lp=Crp, —p(t)—@()ED,, —H(0) =pMeW, 
is satisfied for all t=[0, 7]. Here 


Y,=diag {X*(s, P)H,, =-(s, P) L-}, 
Cyp=col {2*(s, P) Sip. +Q*(s, P) FR», 
= (s, P)S-p_+Q-(s, P)FR-y}. 
We suppose that Y(t) hasin 4, forall t=[0, 7] a continuous strong derivative 


dq/dt. For the purpose of studying system (1)—(3) we considerin 4, the following 
non-stationary equation: 


dp/dt+L,p=C (t, p), p (0) =p, °=D,, 


L,=diag {2*(s, P)L,, &-(s, P)L-}, 
C(t, p) =col {2*(s, P) Sip. tQ*(s, P)FR,(pt@(t)) 
+it(s, P) (S,—L.) p+ (t) —de,/dt, 


S-(s, P)S-p_+0-(s, P)FR_(p+o(0)) 


+ 3-(s, P) (S.—L_) p_(t) —de_/dt}. 


The solution of (14) is understood in the ordinary sense [7]. 


Lemma 13. 


Let X (f) be a solution of (14) with ‘=p? —@(0), then p(t)=x(t)+@(t) isan 
L “solution of the system (1)—(3). Convereley, if y (¢) is an L solution of (1)—(3), then %(t) = 
(t)—@(t) isa solution of (14) with p,“ = p“—@ (0). 


Therefore, the question of the Ly -solvability of system (1)—(3) has been reduced to the 
question of the existence and uniqueness of the solution of (14). 


Lemma 14 (see [8] ). 
The operator —L, generates a Cp-semigroup U(t)in #H,. 


2. In this subsection it is assumed that the function ¥ (x): (section 1, subsection 3) in an 
e-neighbourhood of the point —up is compressed in such a way that the compressed function 
F (x)<=C* satisfies the conditions 


F (x) =F (z) if ‘z>—uote and r<—uy—€, 


max |¥"(zr)|=k<o, 


—0o<x< co 
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Lemma 15. 


We suppose that y(r) has in W,, for t=[0, 7] a strong derivative dg /dt continuous with 
respect to t,and that dp/dt hasin HH, acontinuous strong derivative d°@/dt? for 
t=[0, 7]. Also let the conditions of Lemma 5 be satisfied. Then derivatives C,’(t, x), Cx’ (t, 2), 
continuous with respect to the ensemble of variables, of the operator Ct, x) exist: 


d 
C’ (t, 2) =col {ors P)P’R, (2+@(t)) R, ( =! 


dps 


+y+ pe 
LS+(s, P) [S, 2,1( + 


aps ag ’ dy 
—*. O-(5,P)F R-(c+@(0)R-(—*) 





+2" (s, P) (S._—L-] ( = - 


a (h,: 2) 
St (s, P)Si+Q*(s, P)F’R, (a+ (t)) Re 


Qt (s, P) F’Rs (z+@(t))R 
4 | Q- (s, P) F’R_ (z+ (t))R- 


D-(s, P)S_+0-(s, P) F’R_(x+(t)) R- | (16) 


In addition, C,’(t, x), C.’(t, x) satisfy a Lipschitz condition on x (C;'(¢, x) in the norm |I-ll,, 
and C,,’(t, x) in the norm || - || tp-> ty) + 


The tilde over the operator symbol denotes the substitution F (x) >F (x); F’= 
F’(p(P)), p(P) =L,(G). 


Proof, Let *=I6;. We write ® (t)=(Si— L,) p(t) —dq / dt, ®’(t) =(S,—L,) 
(dp / dt)—d’p/dt?, S,=diag {=*(s. P)S,, 3-(s, P)S_}. We consider the norm of the 


difference 





| C(it+At, x) —C(t, x) 
H At 
fe a. 
 (t+At) —O (t) _0'(t) 
At " 


+| 0+ (s, P) FR, = Picton (x+@(t)) 


+ 


~C,' (t,2) | 








—F’R,(at+@(t)) Ry (=) | 
+[o- (s, P) FR- ae —FR_(z+q(t)) 





—F’R(e+@(t))R-( a i [lull -FllZellp* + allo. 
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Because of the conditions of the present lemma and the continuity of the operators 
S;, LZ, onW, (Lemma 1), |[J;|,>0 as AiO. Since ¥ (x) =C”, then for every At we 
can find a function 6,:(P), O<@,:(P)<1. measurable in G, such that ¥ (Ry (z + p(t + 
At))) —¥ (Ri(z + p(t))) = F"(Ri(z + p(t)) + Oae(P) LR (p(t+At) —@(t)) ]) 
R,(g(t+At)—@(t)) almost everywhere in G. 


Because of the boundedness of the operator 2, : 6,>L,(G), the strong continuity in t 
of the function y(t) and the continuity of ¥’(x), we have as At>0 


A,(At) =F’ (R, (c+ (t)) +Aa(P) [Rs (@ (t+At) —@(t)) J) 
+F'(R.(a+@(t))) =A, 


(18) 


with respect to the measure on G. 


Adding to and subtracting from /, an expression of the form A,(At)R.(d@/dt), it is 
easy to obtain the following estimate: 
g(itAt)—@(t) dg 





Halls*<Q* { 4a CUR lle, 1,0 


+|[[4.(ae)-.R, (=) a 


From the estimate | A,(At) —A2|<C, where C is a constant number, and (18) we have by 
Lebesgue’s theorem, ||J2||,+-~O as At-0O. Similarly, ||7;||,-~0O as At—+0. Therefore, 
(17) implies the validity of (15). We now show that (16) holds. Let x, h=H >. We have 


C(t, c+h) —C(t, 2) —C,’ (t, x) hill, 
<|IQ*(s, P) [F (R, (p(t) +zt+h)) —F (Rz (H(t) +2)) 
—F' (RK. (p(t) +2)) Rh] llp*+11Q-(s, P)(F (R-(p(t)+2th)) (19) 
~F (R_(o(t)+2)) —F’ (R_ (p(t) +2) ) R-hI lly 
=[Lelle*t+lZsllo~. 
Since F(x) eC", then forevery hed, afunction 0,(P),0<0,(P)<1, measurable on G 


can be found such that 
F(R, (p(t)+at+h)) —F (Ry (p(t) +2)) 


=$'(R.. (p(t) +r) +0,(P) Rsk) Rh. 


(20) 


By the boundedness of R, : ,—L,(G) and the continuity of F' (z) as ||h\|,~0, we have 


As(h) =¥#" (R, (p(t) +x) +0, (P) Rik) + F(R, (p(t) +z)) =A, (21) 


with respect to the measure on G. Substituting (20) into /4, we obtain the estimate 
Zellp+<Q*||A3(h) —Aallp tI ReAlleicsy 
<Q*|| Rx llvep+1~c)llAs(2) —Aally* llAll>. 
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From (21), (22) and the estimate |A;(h)—A.|<C we obtain by Lebesgue’s theorem that 
IZellp*/hllp--O as ll) O. Similarly, Zsllp~/l|l]>>O as |\llp-~O. By (19), this implies 
the validity of (16). To prove the last assertion of the lemma we obtain the necessary estimates. 

Let zx, y, hEF,, then 


Cy’ (t,x) Cr (ty) lp 


d 
<Q*[&. (SE) |], MRR +2) FR (oO tu) let 


+o-|[a (<2) |] wrR-@wW+e)-FR-@W ty) 


L(G) 


| , dp 
<(4n)‘/"k | Qt (xt)? R, ( ) Ralls, +2, co) 


dt L (6) 


so erefa(S) 


L(G) 


d 
x|[a- ( “) | "ee «| llz—yll><Mpllz—yll, 
4 dt L 4, (&) P P 


M,=const. 


Similarly, ILC.’ (t, xz) —C.’ (t, y) Jhll, 
< (4x) '/?k[ Q* (s+) “PIR, Ilsep+reo ay IR, Wien p(G) 
+Q-(E-)/? || R_llsep+e<ccyll R—lep-+rpce> ] l2—yl lll. 


Finally, we note that the continuity of C,’(t, x), Cx’(t, £) with respect to the ensemble of 
variables follows from the continuity of Y’(x) and the conditions on {£). 


Theorem 5 (see [7] ). 


Let the operator —L, generate a Cy-semigroup in H,. Let C(t, x) have on [0, T]X#, 
partial derivatives C,/(t, x), C,’(t, x) (in the Frechét sense), continuous with respect to the 
ensemble of variables, satisfying a Lipschitz condition on x. Then there exists a solution of (14) 


defined on some segment [0, 7,]<[0, 7]. 


Theorem 6. 


Let the conditions of Lemma 7 be satisfied. The (14) has (for F(x)) a unique solution 
defined on some segment [0, 7,]<[0, 7]. 


3. Let F(x) be the function introduced in section 1, subsection 3. Let 9’(s, P, t)= 
g’(s, P), t=[0, T].Thenin 2,=L,(QXG) XL.(QXG) we can establish Theorem 8 of the 


existence and uniqueness of the solution of (14), which is a corollary of Pao’s result (Theorem 7) 
and Lemmas 16, 17. 


Theorem 7 (see [9] ). 


Let —L; be dissipative in L, with constant B and the domain of values of the operator 
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al+L,, a>, be identical with [>. If C(t, tp) =C(w) satisfies the Lipschitz condition 
IC (Ap) —C (q) llaskillp—alla, wy, p=L2, k,=const, 


then (14) has a unique solution. If in addition 6>Q and k,<§, , then every equilibrium 
solution (see [9] ), if it exists, is exponentially asymptotically stable. 


Lemma 16. 
The operator —L, is strictly dissipative in Lp. 


The proof follows from the estimate (L,9, @)=(4i.@+, @-)+ (i-g-, p-) =(Lig 
p:)it(L-g-, p-)-=(@+, Qe) t+(Q-, p-)-=llpll“= a’ llqllz.”, which follows from the 
result (see [2]) (L.g., p.).=> (pz, @i)=. Here 

a*=min {o*, o-}, (z,y).= ( &*(s, P)z(s, P)y(s, P) ds aP. 


QXG 
Lemma 17. 


The operator C:#/,>4, satisfies a Lipschitz condition with constant 


b° P a 
ity = — [Silt BL (Qt) Rall stee0)+ (Q-) R-II eee 


where l°=max {X*, =-}. 


The proof of the lemma is similar to that of Lemma 8, if we take into account the fact that 
allglin<loll<bllglla. 


Theorem 8. 


Let g’(s, P, t)=Q'(s, P), t=[0, 7]. Then (14) has in Z, a unique solution in any 
interval [0, 7]. Moreover,if k,<a*, then every equilibrium solution [9], if it exists, is 
exponentially asymptotically stable. 


Translated by J. Berry. 
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THE SPATIAL KINETICS OF A PULSED HEAT-CAPACITY REACTOR* 


A. D. KLIMOV, L. G. STRAKHOVSKAYA, R. P. FEDORENKO and 
I. L. CHIKHLADZE 


Moscow 


(Received 24 April 1975; revised 18 September 1975) 


A METHOD of integrating the three-dimensional kinetic equations describing the evolution of the 
neutron and temperature fields during a neutron burst in a pulsed heat-capacity reactor is presented. 


1. Statement of the problem 


Until recently transfer processes in nuclear reactors in the majority of cases were described 
in the point model approximation (see [1] ). However, this approximation was often found to be 


inapplicable. 


A number of papers exist devoted to the development and extension of various methods of 
studying three-dimensional kinetics. The numerical methods that are most promising in their 
possibilities have been intensively applied in connection with the emergence of high-powered 
computers [2]. 


In this paper we present a method integrating the non-stationary diffusion equations describing 
the kinetics of the neutron field of pulsed heat-capacity reactor [3]. The method considered can 
be generalized to solve a wider class of applied problems of reactor physics. In particular, it makes it 
possible to investigate the kinetics of the neutron field in the control of criticality or of the neutron 
flux at a fixed point, to study depletion processes in systems allowing for local properties at each 
point of the active zone etc. 


To be specific we consider a neutron pulsed reactor or the IGR type in (r, z)-geometry. Before 
ignition the reactor is in the critical state with minimum controlled level of the neutron flux @p and 
temperature 7). To produce the pulse a reactivity jump is applied to the reactor. The energy emitted 
in the nuclear reactor is stored in the graphite of the active zone as heat, and as the stack heats up 
the burst is quenched because of the negative temperature effect. 


In the initial stage of development of the pulse the steady period of the start up is much less 
than the life-time of the sources of the delayed neutrons, so their effect can be neglected. Below 
their effect appears in the fact that the decay of the neutron flux proceeds more slowly. 


For simplicity we consider a model in which the evolution of the neutron field is described 
in the two-group diffusion approximation: 


6 
1 a0 
— = div D,V®,—>,,0, + (14-8) v3,®, om AC, (1.1) 


Vy 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 162-174, 1977. 
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1 00, 


= div D, VO,.—>d,,0.+2,,' D,, 
Vv, Ot 


ac; 
-_ = —AC  +Bvd,O2, i= 1, 2, ae) 6, 


OT ax 
ea eee I, (1.4) 
i (3 V8 


where ®, is the flux of fast neutrons, cm 2sec—1, ®, is the flux of thermal neutrons, cm~?sec~!, 
C; is the density of the sources of delayed neutrons of the i-th type, cm~3, T is the temperature of 
the medium, °K, D,, D> are the diffusion coefficients for fast and thermal neutrons, cm, 3,,, >,, 
is the absorption section for neutrons of the corresponding energy groups,cm~*, ; is the 
fission cross-section, cm~ 2, v is the number of secondary neutrons, &,,’ is the “withdrawal” 
cross-section of the fast neutrons, 6; is the fraction of delayed neutrons of the i-th type, 8 is the 
effective fraction of the delayed neutrons, A; is the decay constant of delayed neutrons of the i-th 
type, sec~!, c(7) is the specific heat of graphite, cal/degree, y is the density of the graphite, 
g/cm3, a = 7.258 X 10-12 cal/fission, and v,, v4 are the effective velocities of the corresponding 
energy groups of neutrons. [For thermal systems of the IGR type the approximation considered 

is fairly complete; the generalization to the case of the multigroup diffusion approximation is a 
formal standard procedure.]. 


The coefficients Di, Xa,, X;, V2, c(T) are functions of the coordinates and temperature 
T(r, t). 

Since after the characteristic time of development of the neutron pulse At<10_ sec the 
temperature adjusts itself because of the thermal conductivity at distances /~ (Ata) “<Aa.s. 
(where a is the thermal conductivity of the active zone cm~sec!, and R, , is the radius of the 
active zone), then in Eq. (1.4) we neglect the effect of thermal conductivity. There is no difficulty 
in allowing for this effect in (1.4) within the limits of the method considered. 

The functions ;(7, ¢), Ci(7,t), T (7, ¢) satisfy the initial and boundary conditions 


0, (7, t) | ;-o=O.;=const, j=1, = 


Ci (7, t) |:-0=0, at Me 


T (F, t) | .-0o= To, 


er 
“Or. 0; (7, t) |,-o=0, 


0, (7, t) | r=0, 


where Tis the extrapolation boundary of the reactor. 
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Equation (1.3) is eliminated by integration and substitution of the corresponding terms in 
Eq. (1.1): 


8 t 


1 1.Ci(F, =v Bid: exp (—Ait) f exp (Ait’) 3, (7, t’) .(7, t’) dt’. (1.10) 


i= taxi 0 


The choice of the method for the numerical solution of the problem is essentially determined 
by the characteristic features of the process of development of the neutron pulse described by 
Eqs. (1.1)—(1.9). These features are as follows. 


1. Three characteristic time scales exist, different from each other: 1,<t2<t, where 1, 
is the characteristic time for Eq. (1.1), 7 is that for Eq. (1.2), and 7 is the characteristic time of the 
process considered. The relation ™;<t2 is due to the fact that v,>v. (v,~10%v,) and after a 
time ~ 7, the function ®, changes negligibly, but ; becomes “steady-state” as the solution of the 
equation 


L,0,+A,,0,+A,.0.~0 


with a given value of ©. The relation tz<t_ is connected with the value of the coefficient in 
(1.4) and with the nature of the dependence of the coefficients of Eqs. (1.1) and (1.2) on the 
temperature 7(r, z, t). Times, small from the point of view of the variation of T(r, z, t), are in this 
problem large from the point of view of the variation of ®, (and all the more of ®,). 


2. Discontinuous coefficients. The whole domain of the calculation is subdivided into a 
comparatively large number of zones, in each of which its own values of the coefficients of the 
system are specified, the disparity in these values being fairly large. Therefore operators of the type 


0 


: 0 
D 
0z 


0 
_ r ? 
r Or Or Oz 


(and their difference approximations) are non-commutative, and this causes certain computational 
difficulties. 


3. At the initial instant the distribution 7 (r,z,0), @,(r,z,0),@2(r, 2,0) of the 
neutron background is specified. 


By varying the absorption properties of the medium (withdrawal of the control rods from 
the active zone) the reactivity varies and the system becomes subcritical. This leads to an 
exponential increase in the functions ®; and ®, with a period proportional to the subcriticality in 


the initial state. 


On attaining some level of ®, and ®, an increase in temperature by Eq. (1.4) begins. The 
change in 7(r, z, t) leads to a change in the coefficients in Eqs. (1.1), (1.2), the rate of increase of 
®, , ® slows down, and the system gradually passes from the subcritical state into the critical state, 
after which the increase in ©, , ®z, is replaced by a fall to a value at which the increase of 
temperature practically ceases: the system passes into the subcritical state. Therefore, after a time 
of the order of ~1 sec the functions ®, and ®, change by several orders: 


1) in the first stage (exponential growth with constant period) @, and 4 are varied by a 
factor of approximately 1012 compared with the background; 
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2) in the second stage, connected with the variation of the properties of the system (the 
coefficients), when the temperature is increased to the critical state the functions , and ©» are 


increased by a factor of 102 to 103, and the temperature is increased by ~ 1000°; 


3) the third stage begins from the critical state, when T continues to increase, and ®, and ®, 
are decreased by a factor of 102 to 10 from the value of the function in the critical state; 


4) on attaining asymptotically constant subcriticality the functions ®; and ©, continue to 
decrease with constant period proportional to the value of the subcriticality. 


Essentially the second and third stages of the process can be computed; the first and fourth 
stages are described asymptotically. 
2. Computational difficulties 


Below it will be convenient for us to write the system of equations (1.1), (1.2), (1.4) in the 


form 


0 a®/at=LO, (2.1) 


OT /0t=A 22, Oo=(Q,, O.), 
‘ -| divD,Vt+Au Ay ] Q =| hai oes | : (2.2) 
A 1/v2 


24 div D, V +A 22 0 
The correspondence between the “physical” and “mathematical” notations for the coefficients 
(that is, between the quantities 2.,, 2o,, Za, etc and A, 1,4} etc) is easily established by a simple 


comparison. We note that the expression 
6 


A,.@, + NC; 
i==1 
in Eq. (1.1) is written in the form A,2®, after using a formula of type (1.10). We do not discuss 
this in detail since in the calculations considered below the exact calculation of the delayed neutrons 


is not very important. 


A special, fairly complex method was developed and used for calculating the neutron burst; 
it is described in detail in section 3. Here we will briefly describe formally possible obvious approaches 
to the problem and estimate the computational difficulties arising in them. 


1. The explicit difference scheme: 


n+i n 
eee Pe = L,9,°+A,,0,"+A,.0,", 


Dia ER. Al = L.9,"+A 1D,"+A22.0,.", 
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Trtit_yn 
T 


= A,.,”. (2.5) 


Here L,, L> are difference operators approximating the corresponding differential operators. 
Courant’s well-known condition imposes a constraint on the step 7. In this case it will be determined 
by Eq. (2.3); for those values of v, , specified spatial mesh steps and diffusion coefficients used in 
our calculations, Courant’s condition gives r ~ 10-7 to 10~—8. The process must be considered on 

a time segment f * 1, that is, the number of steps in the explicit scheme must be ~ 107, which is 


completely unrealistic. 


2. After replacing (2.3) by a stationary elliptic equation we obtain a computing scheme of 


the type 


L,O,"+Ay,O,"+A,.O."=0, (2.6) 


the remaining equations are the same as (2.4) and (2.5). In this case Courant’s condition is determined 
by a quantity v.<v,, 7* 10-4 to 10~3, the number of steps n © 104 to 105. We note that 
formally each step requires the elliptic equation (2.6) to be solved; in reality, it can obviously be 
solved not at every time step but only after a large number of explicit steps by the scheme of (2.4); 
this rernark refers to Eq. (2.5) also. Despite this simplification the problem remains too unwieldy. 


3. The implicit difference scheme: 


1 @,* —0," 7 ” . 
enn oe LOT +AO, +g i Ai=A;(T"), 


be —@Q,” n n < 
eet AO Mie = L., t'+A.,0,; _ +A,.O, ed 
T 


7°27" 


T 


In this case there are no constraints on the step 7 for stability, but the constraint on 7 connected 
with the approximation accuracy must be taken into account. It can be estimated by considering the 
solution by the implicit scheme of the simple equation with exponential solution 

n+i n 


dz s ~-s 
—= az, —=az"t', that is ot! am 2. 
dt T 4—at 


We compare the numerical solution z,(¢)=2z(0) (4—at)-* with the exact solution z(t) 
=z(0)e*. Making the substitution (4—at)~'~e%(14+a?t7/2) (on the natural assumption that 


at<1), after obvious calculations we obtain 


at =) 


2,(t) x(t) (1 + 


We have to obtain a solution increasing by a factor of 103 (that is, at=In 10°) ( with the 
relative accuracy p%, that is, at/2=p-10-*, from which we obtain for the number of steps 





Pulsed heat-capacity reactor 159 


n=t/t the estimate n~In’(10°) -10?/2p. A similar estimate is obtained for calculating the 
third stage of the process also. Therefore, even satisfying the low relative accuracy of 10%, we obtain 
for the number of steps the estimate n ~ 500; moreover each of them requires a system of elliptic 
equations to be solved. Using recently developed efficient iterative methods of solving difference 
elliptic equations and the availability of an excellent initial approximation (©”*! differs little from 
"), the solution of the problem by the implicit scheme must be regarded as quite feasible on 
modern computers of the BESM-6 type, but all the same extremely laborious. 


4. At first glance the method of alternating directions, not having stability constraints, 
permits the calculation to be made with the same 7 step as in the implicit scheme, but with a 
considerably simpler algorithm for calculating &”*!, without the use of iterative methods. However, 
a more detailed analysis shows that this is by no means the case. 


A numerical experiment was carried out to ascertain the possibilities of this approach. The 


system of equations 


is eee 
— =—C—+—C—+Av—Ayu, 
iat 682. ee) 6Oy Cay 
do 8 ae OOo 
— = —D—+—D—+ Bu-By, 
Gt @¢ O82 dy Oy 


u|r=0, v|r=0 


was solved in the square Oz, yS 1, subdivided into 25 equal parts in each of which the 
coefficients A, A,, B, B;, C, D were constants. The discontinuities in these coefficients (at the 
boundaries of the parts) were chosen close to the discontinuities in the actual problem. The 
calculation was performed on a mesh with step h = 1/60, and on the whole we can speak of a 
simplified model of the actual problem. 


The standard step of the method, defining the transition from the functions uv”, v”, relating 
to the instant ¢,,, to the functions u”*1, y”*1, consisted of two parts. 


The function u”*! is first found from the difference equation approximating the differential 
equation: 


(cux)x+(cu,),—A,ut+Av"=0, u|r=0. 


The solution was found by N,, iterations of the method of alternating directions with constant 
step 7,,, the optimal value of which was calculated by estimating the minimal / and maximal L of the 
eigenvalues of the elliptic operator: tu*(/Z)~”. The function uw” was used as the initial 


approximation to u”*1. 


Then a step of the method of alternating directions is made in the equation 


v= (Dv,) .+ (Dv,) »+Bu"t!—B,v, v|r=0. 


In this case 7, is the time step and the functions u”*1, y”*! relate to the instant t,4,=t,+2t.. 
The possibility of solving the problem with a fairly coarse step 7,, was investigated. The calculation 


was begun with the functions u°(z, y) = v°(z, y) =xy(1—a) (1—y), and was continued 
up to the clear isolation of the exponentially increasing solution u(t, x, y) =exp (At) u*(z, y)- 
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v(t, z, y) = exp (At) v"(z, y). The calculations were protracted (1.5 to 2 hours on the BESM-6 
computer programmed in FORTRAN), but the value of \ was determined fairly reliably. The results 
of the experiment are shown in Table 1. The calculation 1 was performed with a very small value of 
7,, 80 that the Courant number for the second equation K,=4Dt,/h’~1, N.=2, and this was 
sufficient, since after one step of the method of alternating directions, that is, after a time 27,,, the 
function vy was changed by ~ 0.4%. The purpose of this calculation was to obtain the correct value 
of d. The results give rise to no doubts; however the solution by a similar method of the actual 
problem would require ~ 5-103 steps. 


TABLE 1 





No. of 
calculation 























In the following calculations 2 and 3 a considerably greater step in 7, was used, but this led 
to considerable errors in the value of A, although in the functions u*(z, y), v(x, y) and the 
discrepancies were much smaller. The time step 7, was not too great, so that At,*0.05 in 
calculation 2 and ~ 0.1 in calculation 3. Therefore the difference in the values of \ cannot be 
explained by simple errors of approximation, similar to the error in the integration of the equation 
u:=Au by the method of finite differences. Thus, the scheme of first-order accuracy (u"**—u") 
/<=du"-for At=0.1 gives a solution of the type exp (At), where A°A(1—At/2), that is, 
an error in A of about 5% for At=0.1 and 2.5% for At=0.05. 


A scheme of second-order accuracy (w"*!—u")/t=A(u"*'+u") /2 gives a solution with 
X* differenting from \ by ~ 0.1% for At=0.1. The method of alternating directions, possessing 
second-order accuracy in 7, is actually close to this scheme. The most probable source of the error 
is the non-permutability of the operators 


oe ae, See 0 D 0 
dx Ox’ dz Oz’ dy dy’ dy dy 
We illustrate this by the example of the single equation u,=(Cuxz).+(Cu,),. Solving this 


by the method of alternating directions with step 7, we obtain the following connection between 
u"t*=u (this) and u*=u(tn): 


id ee gine Ce ~| he ) 
Ox 


x (E- 


It is known that a lengthy calculation by this formula leads to the isolation of the principal 


eigenfunction of the operator B;. Because of the non-permutability of the operators, the eigenfunctions 


of B, are not identical with the eigenfunctions of the operator 
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re SS, 
Sm foie fine ahs 
Ox Ox Oy Oy 


this difference is the more important the greater the value of r. 


3. The method of solution 


The first stage of the physical process proceeds at the fixed temperature 7(r, z, t), that is, with 
values of the coefficients of the system constant in time. Then the solution can be represented by a 
Fourier series: 

D(r,z,t)= 7 Cy EXP (Ant) px (7, 2) ; (3.1) 


k=1 


here W;(r, z) are the eigenfunctions, and y are the eigenvalues of the operator Lip,=AsQp:r. The 
operator L is elliptic, its spectrum is real and overbounded: —o<...<~A,<...<A,=A. If 
A>0 _ the system is supercritical, if A<0Q the system is subcritical. At the initial instant (t= 0) 
a supercritical is created in the system. 


We write (3.1) in the form 
O (r, z, £) =exp (Ayt) =. c, exp[—(Ay—Ax) £] pa (7, 2), 


k=1 


(Ai—A,)t>1, if kA. 


It is obvious from (3.2) that at the first stage (since its duration is very great) a dominant part in the 
right side of (3.1) will be played by only the term corresponding to the extreme right point of the 
spectrum : @ (r, z, t) © c, exp (Ait) yi (7, 2). 


Therefore, instead of calculating the first stage of the process we must determine the first 
eigenfunction y,(r, z) of the operator Lip=AQrp, corresponding to the extreme right point of 
the spectrum A==),, and proceed to the calculation of the second stage of the process, when 
the temperature begins to change. As initial data for calculating the second stage we have to take 
the function @ (7, z, to) =N (to). (7, 2), where M(tp) is a value sufficiently great for the process 
of temperature variation to have begun. The quantity M(t,)) can be estimated : the characteristic 
time of the process At (0.1—1 sec) is known, the characteristic variation of temperature is known 

AT (~1000°}. Wecan use the tentative relation AT/At~A.2NV(t.) max @2(r, 2), then 
N(to) = AZ’ (Ag max @2At)—', where i= (i, @2). 


Usually M(to) is taken less than the required value by a factor of 10 to 100, this leads to a 
situation where the calculation of the first few steps in time is performed with an actually unchanged 
T(r, z, t), that is, the “‘tail” of the first stage of the process is computed (see Fig. 1). 


As for the initial “background” ®(r, z, 0), the following natural assumptions about it are made : 


1) in the expansion of ®(r, z, 0) in a Fourier series the coefficient of the first eigenfunction is 
not too small in comparison with the others; 
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2) the absolute value of ®(r, z, 0) is sufficiently small and the time necessary for || @ (r, z, t) || 
to become a quantity of order ~ M(tg) at which the change of temperature has begun, is sufficiently 
great for all the terms, except the first, in the sum of (3.1) to be neglected. 


For the calculation of the second and third stages of the process, when the temperature effects 
lead to a spatially-inhomogeneous variation of the sections, a method was used which is a 
generalization of the Fourier method, perhaps rather less accurate, but more economical from the 


point of view of computing time. 


An approximate solution ® (r, z, t) =(@,(r, 2, t), O2(r, z, ¢)) of the system (2.1), (2.2) 
is sought in the form 


O (r, z,t) =N (t.) exp | fA@ar| p(r,z,t), (3.3) 


to 


where (Tr, 2, t)=(q@i(7, Z,¢), @2(7, 2, )) is the normed first eigenfunction, and A(t) is the 


first eigenvalue of the operator 
L(t) g(r, 2, t) =A(t) Q(t) Q(r, 2, t); (3.4) 
L, A, Gdepend on the time implicitly via the temperature 7(r, z, t) occurring in the coefficients. 


The equations for C; (r, z, t) and 7(7, z, t) assume the form 


OC; 


+ =—AiC;+BiAs2N (to) exp f A(r)ar] @2(r, z, t), 


to 


7 ac Sade ven ) 
<= AN (to) exp| J A(x)dt| g2(r,2,0). 


To estimate the error of this method we substitute the approximate solution (3.3) into the 
left side of (2.1); we obtain 


~ {v(u)exp| j A(t) ar | pr, 2, )} 


t 


=0{ AWN (dex [ [acer] PtN (to) exp | J A()ax] 7} 


to 


ofaorw(uesp| [acer] 2}-10+9(00), 





dln gy dln Qz 
O,, 
Ot Ot 
0 Inq; 
Ot 


Ao =( 


i 


ae A(t) + 
At =( 
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that is, (3.3) satisfies not (2.1), but the equation 


ao 
—~Ad)=L 
(5; 0 


It is obvious from (3.6) that the effect of the error depends on the quantities 


1 dlng: 


O,, j==4, 2. 3.8 
V; Ot ( ) 


Calculations have shown that the components of (3.8) are less by a factor of more than 100 than 
the individual terms occurring in the expression Q 0D /dt—L. 


In connection with Eq. (3.7) it is also understood that the method of norming the eigenfunction 
still remains arbitrary. The normalization is defined by the relation 


HR 


f J [@e(r, 2, t) ]?r dr dz=1. (3.9) 


With this normalization the value of 0Oq@:/dt; isa minimum; the value of 0@,/0t is greater 


than, for example, in a normalization of the type 
H R 


lipll? = | i} (p-+q.")r dr dz=1, 
0 0 
but it is the smallness of Oq./dt, which is important to us, since the equation contains the 


expressions 


Therefore, the solution of the system (3.3)—(3.5) satisfies Eqs. (1.3), it satisfies with high 
accuracy Eqs. (1.1) and (1.2), it satisfies the boundary conditions (1.8), (1.9), and also the initial 
conditions, since as a result of the first stage of the process up to the beginning of the second stage 
in the solution of (3.1) an overwhelming part is played by the first term, which may be written 


in the form V(t.) @(r, 2, to). 


The basis of the numerical method is to find the first eigenfunction (y; , y>) and the 
corresponding eigenvalue A(t) of the elliptic operator Q~! L(t) at each time step. The calculation 
of the process usually consists of 20 to 30 steps. 


We describe the structure of one time step, ignoring for brevity the effect of the delayed 
neutrons. 

The time ¢ is subdivided into intervals small relative to the rate of change of temperature, 
called “steps” in t: t: to<t,<..., however, relative to the speed of the neutron processes the 


step At,=tnii:—t, is “large”. 


At the instant t, let ushave Z'(r, 2, tn), N(tn), A(tn), Qi (Ts 2) bn), Pay 7 
step, the transition from ¢,, to t,41 consists of the following stages : 


1) the choice of the step At, from the calculation of the given (~ 10 to 15%) increment of 


temperature; 
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2) the calculation of the temperature 7'(r, 2, tn4s) at the instant taz.=t,+At,: 


tn44 


T (r, 2, tng) =F (1, 2, tn) + f Aja(T, 2, tn) P2(T, 2, tn) N (1) dt; 
th 


at the same time we compute 


tn44 


N (tus) =N (ta) exp | J A(aar |; 


tn 


3) finding the first eigenfunction (q, (7, 2, tnss), @2(T, 2, tnz1) ) and eigenvalue A (tn+1) 


of the operator 
L (tas) P=A (t+) Q(tns1) @. 


The description of the method of finding the first eigenfunction forms the subject of a 
separate paper (see [4, 5] ). 


This completes the calculation of the fundamental values at one time step; for the next 
time step we again have T(r, 2, tris), N(tnii), A(tnes), Qi(7, 2, tres), Q2(7, 2, tngs)- 


Remark. The reasons for the leading role of the first eigenfunction in such phenomena is 
well known to physicists, and we are informed by Ya. B. Zel’dovich and V. Ya. Gol’din, that in 
their time they were used in some calculations. 


4. Examples of calculations 
The method explained in section 3 was used to simulate on the BESM-6 computer the 


kinetics of a neutron burst in a pulsed reactor of heat-capacity type IGR [3]. 
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The function 5.10~@: 1 is for t = 0.0407 sec, 2 is for t = 0.0477 sec, 3 is for t = 0.059 sec, 4 is for 
t = 0.0619 sec, 5 is for t = 0.0705 sec, 6 is for t = 0.0769 sec, 7 is for t = 0.088 sec, 8 is for t= 0.112 
sec;a = 18 for 1,2, 8;a= 19 for 3 —7. 
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Figure 1 shows the functions A(t), M(t) and 7 (t)=max T(r, z, t). characteristic for 
(r, 2) 


the pulsed mode. All four stages of the phenomenon considered are easily seen. As a control 

this pulse was calculated twice; in the first case the step At,, was chosen so as to ensure an increment 
of temperature 7(r, z, t,,) of ~ 10%, the corresponding values of A(t) are shown by dots on the 
graph. Then the calculation was repeated with a temperature increment of ~ 20% per step, the 
corresponding values of A(t) are shown by circles. In both cases all the values of A(t) are shown on 
the graph, and this permits us to judge the number of time steps. We must bear in mind that the 
function @(t)=N(t)q@_ has physical significance and for N(t) 3-10" the quantity 

 ~ 1018cm-2sec—!. 


Figure 2 gives an idea of the accuracy of the approximation (13), that is, of the possibility 
of neglecting the quantity 01nq@/dt. The graph shows the function 


0 In @2(t 
CO) seas where @2(¢)=max q,(r,2Z,t). 
Ot (7,2) 
This relation is taken from the calculation of a very fast pulse with great initial supercriticality 
A (0) =200 sec~!. 


In the simulation of a neutron burst into instantaneous neutrons the calculation of the second 
stage of the process began with the level py = 10!5 n/cm? sec, the initial supercriticality A (0) = 
201 sec—!, the time for reaching criticality of the reactor t ~ 0.07 sec (A = 0), the total duration 
of the pulse ~ 0.1 sec, and halfwidth of the pulse 71, ~ 0.045 sec. At the end of the burst, when the 
mean temperature of the stack of the active zone 7 = 2430°K, the value of the subcriticality 
A © —60 sec~!. This pulse is shown in Fig. 3. 


Figure 4 illustrates the effect of the delayed neutrons, which is shown by calculations to be 
appreciable only in slow bursts, described by small values of the initial supercriticality, A (0) + 20 
sec—!. The function shown 

@,(t) = max ®,(r, z, ¢), 
(7,2) 
obtained without allowing for the delayed neutrons (curve 1, 71, = 0.2 sec) and allowing for the 
delayed neutrons (curve 2, ty, = 0.205 sec). It is seen that allowing for the integral term (1.10) in 
Eq. (1.1) gives a maximum difference in the amplitude of the neutron flux of the order of several 
percent, the pulse halfwidth 71, is then increased by approximately 2—3%. For these calculations 
the function A~‘d ln @,/dt is of approximately the same nature as the function of Fig. 2. 


In conclusion we mention the paper [6] in which a method of numerical integration of the 
one-dimensional kinetic equation with a sharply increasing solution is proposed. The principal 
content of this paper is the surmounting of the constraints on the time step. 


Translated by J. Berry 
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ANALYSE ASYMPTOTIQUE DES ECOULEMENTS DE FLUIDES VISQUEUX 
COMPRESSIBLES A FAIBLE NOMBRE DE MACH* 


I. Cas des fluides non pesants 


R. Kh. ZEYTOUNIAN 
Lille, France 


(Received 10 November 1975) 


THE GENERAL problem of the analysis of a stationary sink of some viscous compressible medium 
for low Mach numbers is studied by means of the asymptotic expansion of the solution in powers of 
the characteristic Mach number Moo. It is shown that there may be various forms of these expansions 
depending on the temperature conditions on a closed surface bounding the given sink. 


On considére le probléme général de l’analyse d’un écoulement stationnaire d’un fluide visqueux 
compressible a faible nombre de Mach par une méthode de perturbation, dans laquelle la solution est 
représentée par des développements asymptotiques par rapport a un nombre de Mach 
caractéristique Moo. On montre que ces développements peuvent prendre différentes formes en 
fonction de la condition pour la température sur la surface fermée ** qui délimite intérieurement 
l’écoulement. De maniére précise, si cette derniére condition est écrite sous la forme: 


T=T..*+AT,*=, sur d*, avec 7..° et AT,” 


des températures caractéristiques constantes liées respectivement a l’écoulement uniforme loin de D* 
et 4 la variation de température = sur D*, et que l’on suppose que T=AT*/T.°>0, avec Mo 0, 
de telle fagon que: 79 = ApMce, ou Ap est un paramétre de similitude constant et w > 0 un 

nombre réel supposé donné, alors il se présente trois cas: w<2,w =2etw >2. 


Les cas w <2 et w > 2 conduisent a des développements asymptotiques qui sont définis a 
partir de la séquence asymptotique M2? * 4 avec p, q=0,1,... ; lorsque w = 2 la séquence 
asymptotique s’identifie avec celle qui est classiquement connue M2” avecn=p+q=0,1,... 
(celle de Janzen-Rayleigh [1, 2}). 


On précise ainsi, en particulier, l’évolution de la température et de la masse volumique dans 
un écoulement a faibre nombre de Mach (écoulement quasi-incompressible); de ce fait, chaque fois 
que l’on peut calculer un écoulement incompressible de fluide visqueux on peut aussi lui associer 
un calcul des champs de la température et de la masse volumique et obtenir ainsi une représentation 
correcte (au sens des développements asymptotiques) de la solution des équations de Navier-Stokes. 





*Zh. vychisl. Mat. mat. Fiz., 17, 175—182, 1977. 
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1. Formulation de probleme 


Le fluide visqueux compressible est supposé étre un gaz parfait a chaleurs spécifiques Cp et 
c, constantes; il s’étend a l’infini dans toutes les directions et il est limité intérieurement par la 
surface fermée >". A de grandes distances de ©* on suppose qu’il existe un écoulement uniforme 
de vitesse v,," constante dans lequel la pression a | la masse volumique poo et la température 
Too’ sont constantes. On utilise un repére et un systéme de coordonnées cartésiennes 
orthogonales {0, 2:"}. 


Enfin, on désigne par Ves oe p etT™ les composantes de la vitesse, la pression, la masse 
volumique et la température dans I’écoulement induit par la présence de D* au sein de l’écoulement 


uniforme. 


Dans tout ce qui suit la vitesse de l’écoulement en tout point est supposé trés petite devant 
la célérité locale du son; en d’autres termes, le nombre de Mach caractéristique de l’écoulement: 


tie 
= —_——- € f, 
(yRT*_) a 


ol Uso" est une vitesse caractéristique liée Avoo et y=c,/¢,, R=cp(y—1)/¥y. Par la suite la 
solution des équations de Navier — Stokes est done représentée par des développements 
asymptotiques par rapport au nombre de Mach caractéristique Moo. Notons que pour les 
écoulements de fluides parfaits compressibles c’est une méthode classique dans le cas d’écoulements 
stationnaires connue sous le nom de méthode de Janzen-Rayleigh [1, 2]. 


M.. (1.1) 


Par contre le cas instationnaire présente une différence fondamentale par rapport au cas 
stationnaire, car on montre [3, 4] que ces développements, qui sont en fait des développements 
intérieurs [5] ,ne sont pas alors uniformément valables 4 grande distance de D*. De ce fait la solution 
4 grande distance de D* est représentée par des développements extérieurs. Les conditions de 
raccord entre ces deux développements permettant, ensuite, de déterminer complétement la solution 
aussi bien dans le domaine distal que dans le domaine proximal et ce en écrivant le développement 
composite a l’ordre d’approximation correspondant. En particulier, il s’avére qu’il est nécessaire 
d’introduire dans les développements inferieurs des termes en puissance impaires de Moo (termes 
qui sont absents dams le développement classique de Janzen-Rayleigh) pour que que ce raccordement 
soit possible. 


En ce qui nous concerne ici nous nous interessons plus particuliérement au fluide visqueux 
compressible et au domaine proximal proche de la surface £* sur laquelle il faut écrire les conditions 
d’adhérence du fluide visqueux et une condition pour la température. A cet effet, et pour éviter 
toute ambiguté, nous supposerons que écoulement est stationnaire; les équations de Navier-Stokes 
qui régissent cet écoulement stationnaire de gaz parfait s’écriront, sous forme adimensionnele, 


(1.2a) 
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OX, 


Les quantités dimensionnelles étant caractérisées par un astérisque, les variables sans dimensions 
qui apparaissent dans ces équations (1.2) sont définies de la fagon suivante: p = p"/Poo D=p' |Doo’ 
T=T"/Too , Vp = Ug /Uco’ Xz =X" /Ly* ol Lo” est une longueur caractéristique (liée 4 D*, en 
particulier). En plus du nombre de Mach caractéristique (1.1) il s’introduit dans les équations 
adimensionnelles (1.2) le nombre de Reynolds Re = Ua La” frig ou YQ" est le coefficient (constant) 
cinématique de viscosité et le nombre de Prandtl Pr = CyVQ Poolkg” ot kg” est le coefficient 
(constant) de conduction. Notons enfin que les équations (1.2) sont écrites sous ’hypothése de 
Stokes (coefficient de viscosité volumique nulle). 


La solution des équations (1.2) satisfait aux conditions aux limites suivantes (l’écoulement 


étant supposé continu partout): 
1) conditions a l’infini ou il existe un écoulement uniforme, 
2) conditions d’adhérence sur &: vy = 0; 
3) condition pour la température sur L, que l’on écrira sous la forme adimensionnelle suivante: 


T=1+18, (1.3) 


ott 7 = AT, /Too , avec AT,,* une variation caractéristique de température liée 4 la fonction = qui 
0 0 0 q Pp q 


est supposée connue. 
Les conditions ci-dessus sont supposées suffisantes pour rendre la solution des équations (1.2) 
unique; de plus, on se donne la position et la forme de =. 
2. Ecoulement incompressible 


Représentons la solution des équations (1.2) satisfaisant aux conditions aux limites, lorsque 
Mo > 04x, Pr et Re fixés, par des développements asymptotiques de la forme: 
k p pp ptotiq 


v,=))+M..2ue +. .., p=p+M.°p™+..., 
o=p'+M."9+..., T=fO+M.°TO+ ..., 
ou a, B, y et 5 sont des nombres réels positifs pour l’instant arbitraires. 


La seconde équation du systéme (1.2) implique nécessairement que 
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ap /dx,=0>B=2 


soit, en tenant compte, pour p), des conditions a l’infini 


p=. 


Les équations (1.2) peuvent alors s’écrire, en premiére approximation, sous la forme: 


av 
OX ) ‘ 


sage a ore 
ane oe Sabie fea ae 
OX, PrRe dz, 
pT) == 4, 
Au niveau du systéme limite (2.1) les conditions d’adhérence et celles a l’infini restant 
inchangées (puisque nous sommes en écoulement stationnaire, régulier a l’infini). Si, d’autre part, 


l’on ne fait aucune hypothése sur le paramétre 7), au niveau de la condition (1.3), alors on ne 
pourra effectuer aucune simplification supplémentaire au niveau du systéme limite (2.1). 


Pour obtenir les équations régissant l’écoulement incompressible il faut admettre que: 
dans la condition 4 la limite (1.3) le paramétre 7 est beaucoup plus petit que l'unité 
tm<1=>AT,"<T..’; 


de maniére précise on admettra que: 7) > 0 avec Moo > 0 de telle fagon que la relation de similitude 
T=AMe° (2.2) 
soit satisfaite, avec Ag un paramétre de similitude constant et w > 0 un nombre réel supposé donné. 


Dans ce cas et grace 4 l’hypothése (2.2) on peut associer aux équations (2.1) la condition: 
TP ==4, gur >. 


Puisque 7() = 1 et p() = 1 a l’infini il est clair que les équations (2.1) admettent la solution 


triviale: 


T=1, o=4 


ce qui nous conduit aux équations de Navier pour un écoulement incompressible [6] 


dv,” ays a op 0° v; 


? 


OL 


auxquelles il faut associer les conditions 


(0) 
v, =0, sur 2, 


v—v.,,et p +0, a Vinfini. 
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3. Ecoulement quasi-incompressible 


Au niveau de la premiére approximation, qui conduit a l’écoulement incompressible, on perd 
toute information sur l’évolution de la température et de la masse volumique dans l’écoulement a 
faible nombre de Mach. 


Considérons, tout d’abord I’équation de continuité; en tenant compte de ce que 0° =1 et 
dv'°/6x,=0 on obtient pour p”) Péquation 


do’? ne aoe 
Fes AR so 
OX 


seul le cas a = y conduit a une dégénérescence signifcative [7] ,ce qui donne: 





0 (1) 

0 

a” . 
Ox, 


la valeur y > 0, restant a ce stade indéterminée. 
Afin de préciser la valeur de y considérons la loi d’état; il peut alors se présenter trois cas: 
2=6=y, 6=y et y=2. 


Enfin, l’équation de l’énergie peut s’écrire sous la forme suivante: 


ee ae ah kb 
~ Pr Re Oz, 








(0) ty)» 
- ( = : 7 ) 
i 
fo] 
OX, OX; 


a laquelle, sous ’hupothése (2.2), on doit associer la condition 


o—6 = 
T®=M, Aod, sur yo 


Afin de na pas perdre la condition (3.1) «il semble», 4 premiére vue, que l’on doit 


admettre que 


6=o, avec o>0 un réel donné. 


Par la suite il faut considérer trois cas 
1. 0<@<2 
Dans ce cas, si l’on admet que (3.2) a effectivement lieu, on obtient que: 
0<a= 1=6=0<2 
et les fonctions p, v et T satisfont aux équations: 


(@) 
a » 6 dp 
ee ae Se 

OL, Oz, 


? 
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p=—T 


gor? 41 42 ere 


v —_—_—_— 3.3 
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Pour obtenir l’équation du mouvement associé au systéme (3.3), il faut que le développement 
asymptotique de la pression soit de la forme 


p=1+M.2p?+Ma per -+... 
ce quid donne I’é€quation de mouvement suivante: 
av ms av.” 4 apt 


@ 
v} +p; 


Oz, OX, Y Ox; 


. @ ® 0 
mn = { Fv o fa = (3 $Pcoiy OPE 
Re l 02,’ 3 02; \ Cy Le 


On détermine, tout d’abord, T@) a partir de l’équation linéaire homogéne (3.3c) a laquelle 
il faut associer les conditions: 


T®=A,5, sur } et 7-0, a Vinfini. os 


3(@) 


Puis 0°’ =—T7 etensuite v,, et p°*®) 4 partir du systéme des deux équations 
linéaires non homogéne (3.3a) et (3.4) auquel il faut associer les conditions: 


(o) (®) ar: tase 
v, =0, sur 2; v, et p?t?—0, a linfini. 


2. @=2 


Dans ce cas 
T=1+M.,’A,=, sur 
et on retombe sur le cas classique de Janzen et Rayleigh [1,2]: 
0<a=y1=6=0=2. 


Une fois de plus v;, et p) satisfont au probléme de ’hydrodynamique classique (2.3), 
(2.4). Quant aux fonctions v;, (2), p), p(2) et T{2) elles doivent étre déterminées é partir du 
systéme d’équations: 


T?) +9 =p. (3.7a) 
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0 
+——{ 
3 dx, \ Oz, 


On détermine 72) a partir de I’équation linéaire non homogéne (3.7b) avec les conditions: 
T®=A,&, sur > et 7-0, a Vinfini. 


Puis p‘2) = p(2) — T(2) et ensuite vy, (2) et p{4) se détermine 4 partir des deux équations (3.7c) et 
(3.d) auxquelles il faut associer les conditions: 
(2) (2) . SP et 
R,, = >: v, et p\’>0, a Vinfini. 
(3.8) 
a 4>@>2 


Ce cas est interessant en ce sens que l’on ne peut plus supposer que la relation (3.2) a lieu (!). 
Il faut, pour obtenir une dégénérescence significative, que 5 = 2 et on obtient pour 7¢2) l’équation 
linéaire non homogéne (3.7b) qui doit étre alors résolue avec des conditions nulles: 


T=0, sur > et a Vinfini. (3.9) 


Pour récupérer la condition a la limite pour la température sur > il faut faire intervenir un 
terme delaform M.,.°7'’) dans le développement asymptotique de la température; la fonction 
T() satisfaisant au probl'4eme (3.3c), (3.5). 


Ainsi, dans ce cas on doit admettre les développements asymptotique suivant 


0 2 @) 
v,=v,; +M.0, +M..°v, +..., 


2+0 


p=1+Ma2p? +MotpO+Ma pert. .., 
p=14+M.79+Mn°9+..., T=1tM.?TO+M9TO 4+... . 


Les fonctions v; ©) et p{2) satisfont toujours 4 (2.3) et (2.4). Quant aux fonctions v; (2), p4), 
p°2) et T 2) elles satisfont aux équations (3.7) auxquelles il faut associer les conditions (3.8) et (3.9). 
Enfin, la fonction 7() satisfait, comme nous l’avons déja noté, au probléme (3.3c), (3.5), puis 
pio) = — _ et v,() et p(2 + ) au probléme (3.3a), (3.4), (3.6). 


En conclusion, notons que l’on peut aisément étendre cette méthode au cas ou la condition 
pour la température sur la surface D* est de la forme: 
oT* 
—k,* n,=AQM,9, sur >, 


Ox," 
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J ee ” | 4 4 * ite 
ou Ad . est un flux caractéristique constant lié 4 ®, supposé donné et n° = {n,*} le vecteur unité 


os . * 
de la normale extérieure a 2 . 


Nous profitons de l’occasion pour remercier le Professeur J. P. Guiraud de l’Université de 
Paris VI pour les discussions et suggestions concernant, plus particuliérement, le cas de w > 2. 


Notons, enfin, que la seconde partie de ce travail sera consacrée au cas des fluides pesants 
en rotation, ce qui nous permettra, en particulier, d’obtenir de maniére rationnelle les équations dites 
de Boussinesg et les conditions aux limites qui doivent leur étre associées. 
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A DIFFERENCE SCHEME FOR THE PROBLEM OF THE STRONG 
BENDING OF THIN PLATES* 


C. N. VOLOSHANOVSKAYA and M. M. KARCHEVSKII 
Kazan’ 


(Received 23 April 1975; revised 4 November 1975) 


THE FIRST boundary value problem for a system of non-linear differential equations describing 
the strong bending of flexible thin plates is discussed. 


In this paper we construct and investigate a difference scheme for a system of non-linear 
differential equations describing the strong bending of a thin flexible plate with displacements 
occurring under the action of an arbitrarily directed external load (see, for example, [1]. 


Sufficient conditions for the existence and uniqueness of the solution of the differential 
boundary value problem are first established. 


Questions of the existence of the solution in the case of a normal load have previously 
been considered by many authors (see, for example, [2, 3] . 


The method of constructing the difference scheme used in this paper is based on the 
approximation of the integral identity (variation equation) by an accumulator [4—6] . Then the 
fundamental properties of the differential problem are preserved for the difference scheme, which 
permits existence and uniqueness conditions of the solution of the difference scheme to be obtained 
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comparatively simply. 


It is shown that in conditions of uniqueness the difference scheme has accuracy O(h2) on a 
fairly smooth solution. 


1. Statement of the problem. Investigation of the 
existence and uniqueness of the solution 


It is known that the strong bending of thin flexible plates can be described by the following 
system of equations [1]: 


0 Ow 1-—v Ow 
w-a'{ Lo 4 
9 Ox, 


_ 


4—v 
2 


a Ow 
+<_| alien oy + 


9 


where a=h’(mes" Q’)-', h’ is the thickness of the plate, 92 is the region occupied by the plate, 
€;, €2, y are deformations of elongation and shear of the mean surface, 


as Ou, Ou, , Ow Ow 
—}, i=1,2, Y= +—+ a —}; 


€; =—_ + — [- wet “| 
Ox; On. Oz; Ox, Ox, 


Li 2 


Ou; a” (2 


X1,Xz are dimensions coordinates, u(u;, U2) is the dimensionless vector of the displacements in the 
x4 X>-plane, w is the vertical displacement of points of the mean surface, and P(P;, P,), Q are the 
components of the external load. 


The dimensionless variables are introduced as follows: 
zi=2;' (mes” Q’)-*, ui=u,’ (mes* Q’)-*, w=w'/h’, 
P;=P,/ mes” Q’/B, Q=Q'h’/B. 
Here B=Eh’/(1—v") is the longitudinal rigidity, F is Young’s modulus, and O<v<1 is 
Poisson’s ratio. 
We will consider the case of a plate rigidly clamped along the contour: 


Ow 
ulr=0, wlr=0, —} =0, 
On Tr 


I — is the boundary of the domain Q, and n is the normal to I’. 
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We define the generalized solution of problem (1), (2) as a vector function (wu, w), W:, 
new. ,weWw “ye ,satisfying the integral identity. 


On: 9 Ow 0g : ; e.. 
j | (e.+ve2) (—" + dt PR ) + (ertve,) (- 


OX, OL; ORs OX, OX. 


+o] 


7 y + i aa © Am 


{—v bee On: , Ow 0§ 
2 "NOx: Oz, Ox, Ox, Ox, OX, 


oy 
ae piete v1. +Po.n tO) dr 
Pr Aw As Jax { (PynitPonet QE) dx 


_ 


for any news”, een: . 


In this paper the following a priori estimates of the solution of problem (1), (2) are 
essentially used. 


Lemma 1 


For any P,, P,=W!-") the a priori estimate 


Patt, c’a? 4 3—v 
lu \l wo <— 


9 


"5 ‘ > 
wll + — [Pllweo : 
wade Be 


1—v 


holds; if the condition 


4 4{—y \ ‘he 
[Pllws9<—( : ) ( 


c’a’” \3-—v 


is also satisfied, then 


llwllwe<K ($) (IPllwotllllwse), 


K (6) = max a: =), 


llallwo=llullwotlualleo, WP llwco=llPalleycot ll Pall we, 


c is the constant from the inequality 

|v <clv 

i yo S | lar 
valid for any functiony of W,) [7]. 


Proof. 1. Putting in (3) y:=wi, N2=u2, =O and using the Cauchy inequality and the 
inequality (7), we obtain 


2 
ns 
Ox, OX» =) 
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peal ea 

a Ox, Ox, OxX2 Ox. Ox, Ox, 

POT Siren | da i 
O22 Oz, 


Ox, Ox, 
a? / 3—v 
<||Pllweollullwo + % ( 


)] dz +f (P,u,+P.u,) dx 


a ‘ fe 
) ° etlwilirg I 


Then using the easily proved inequality 


we obtain (4). 


2. Now putting ,;=2u,, n2=2u2, E=w, in (3) we obtain 


at 
D l|wllwe<2I|Pllw-ollall wot lll weallwllwe . 
Using (4), we have 


aS 9 


» a 2 Ip 
5 Walling <2IP loro ( ~)"lwline +2 


Pliw. 3) 
Pll we i 


ee 


+1lQllwsollwll wv , 


4 a "Yo 4 
(= — ca? (<=) “IP llws) lollng < — IP ling 


1—v 


4 ) 
bis Olli» toe llwllwy. 


Choosing 6,65, we obtain 
1 


-- Qllws-». 


lwllwe < I|Pllws-o + 


5(1—v) 
Using this lemma and the method of [3] , we can prove the following theorem. 


Theorem | 


Let P,, P,<w;”, Qews” and condition (5) be satisfied, then problem (1), (2) 
has at least one generalized solution. 


To investigate the uniqueness of the solution of problem (1), (2) we will require Lemma 2. 
Lemma 2 


Let condition (5) be satisfied, then 


i— 2 1 
J (eft2veertee +7") de<—— Plot — lilies.) 
2 1—v : 86 : 
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Proof. Putting ni:=2u, n2=2U2, E=w, in (3), we obtain 
= 1—v 
f {= (Aw) 84-2 [ et+2veverter iv — i | } dz 
Q 


=2 f (P,u,+P,u,) dz + few dz<2||P\lw-ollullwot|lQllw-»|lwilwe. 
Q 2 
Using (4), we have 


4-—v 


4 
ay J (Aw)%de +2 f (c,¢+ 2veves + est + *) de 
Q Q 


3-—v \'2 2 
) {| w|| wo + ——— |Pllw-») 
yy : iy ? 





> ee | 
< 211Pllaco (—( 


+ Qllw-» + wll we. 


Putting §6,/2=6, in this, we obtain (8). 
Lemma 3 


(u, w), (vw, w) be generalized solutions of problem (1), (2), and let 


eo "hy 2 Q \'h 
[ f (e+ 2veve + 2 Mu 7") dz | < Z ( ) ’ 
: 2 ; 12c? \3-—v 


where &=(ecte!)/2, F=(y+y)/2. Then u=u, w=w'. 





Proof. Putting ni=ui—u™, i=1, 2, E=w—w") in the integral identities for (u, w) (uv, 
w) and subtracting these identities term by term, after simple but laborious transformations 
we obtain 


f[@- ef)? + Qv(e,—e!" ‘ioc, 1+, Or 7" 


Q 


Toa a‘ 
Be te Saye cat 
ce Wis )* ae + Sf aw w))*dz +1 =0, 
Q 





i$ [LY tw 








d(w—w™) d(w—w") Jac, 


0 (4) 
( (w—w 
Ox, 


OX» 


} (& + v8) + (1 —v) ¥ 


It is easy to see that 
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sani w'’) ) +9) ( ao ) 








«(EY + (ey a 


x { [ (8, + -ve,)? +(1 —v) 92 + (8 tye) '}dz}". 


From this, by (9), we have 
} | (e,— e,°)? + 2v(e,— g:) (e.,— es) +(e, gar )? 


Q 


on 
— (y— 4)? | dz thw — wll <0, 


consequently u=u"), w=w''), The lemma is proved. 


Remark 1. Lemma 3 shows that if the potential energy of deformation of the mean surface 
is sufficiently small, the plate has a unique equilibrium shape. 


Theorem 2 follows directly from Lemmas 2, 3. 


Theorem 2 


Let condition (5) be satisfied, and let 


af 





Ba Coie 
(se IQ liws-» aS epg |Pllw-) 


Then problem (1), (2) has a unique generalized solution. 


Remark 2. Let P= 0. Then 6 can be put equal to a4/12 and condition (10) assumes the form 
a* 
IQllw.'-? <—— [3(3—v) ]-"*. 
6c? 
Then, by (6), for a deflection w we obtain the inequality 


42 
max|w|<||wliw, < one IQllw.-) < 
x a 


c2[3(3—v) ]'* 
For a dimensional deflection w’ we obtain 


2h’ 
max|w’ |< -——————.. 
5 c?[3(3—v) ]" 


In particular, when the plate is rectangular we have c? = 1/m, consequently, 
2m 


max | w’ | << ———— hh’ € 2.5h’. 
x [3(3—v) ]* 
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This estimate gives an idea of the amount of deflection permitted by the conditions of 
uniqueness of the solution. 


2. Construction of the difference scheme 


In what follows we will confine ourselves to the case of a rectangular plate 
Q= {zx |O<z,, 7,.<1}. 


We construct on 2 a mesh with steps h,,h> along the x;, x axes. 


We introduce notations for the subsets of the mesh, the difference ratios and the sums of the 


mesh functions: 
6 ={z|z=(isha, ish2) , iz =—1, 0, eee, N,+4, Nih,=1}, 


o={r|xz=(ishi, isha), i,=1, 2, oney N,—1}, 
V={z|r=o, z,=0 or z=1, 0<z;S1, i*j}, 


y= {x|zeq, 2=—h;, or T=1th, —hy<rj<1+h,, ij}, 
TTY, PRN Pe 
Let r= (r),r2) be a vector whose coordinates can assume the values +1. We put 


a 
eir(y, v)= tir = 0, Yi ae (0, v)*, 
1 (y, v)= Yr =< OrYs + OY2 + a 0.v 0,V. 


Yx r= +1, 


0, Y= { 
; Yz;, r,=—1, (y, 2), =((Y,2) ry 1) ns 


{ Ni 
h; Y isda Zisday r= 1, 


jg=i 


(y,2)+,= { 
N;,-1 


h; Yisie Bis i25 r=—1, 
j=0 


N,j-1 N;-1 


Yh VOY putin w2l=—Y Wa) 


r 





ij=1 i,=1 


4 2 
lull? = YY (1), PEA ylh ly 
i=i fr 


Ss. 4 2 
liylla -—)' ((Ay)?, 1),, 
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Ay=Yzx,tYax, is Laplace’s difference operator; 


| (y,z) | 
haa, ah 


z=40 WZ ln 


We denote by H the linear space of mesh vector-functions Y(y, v)=(y:, yz, v), defined on ® 
and satisfying the boundary condition 


(11) 


y ee = 0, 


In the construction of a difference scheme for problem (1), (2) we will start from the integral 
identity (3). 


Definition. We say that the mesh vector function -Y(y, v)=H isa solution of the 
difference scheme for problem (1), (2), if for any mesh function x¥(n, £)=H the summation 


identity 
4 + 
iid 4 5 ( (G4, + Veer) (OM: + 47,098) 


r 


a (€2, + VEir) (0,.N2 7 "0 ,V9 8) + 


1-—Vv 
2 


1e(Onma+ Oram 


4 
+ @70,,V0,,E + a0,,00,,€) + = Av Aé, 1 ) = (@:, M1) + (qe, Ne) + (8, &), 


r 


is satisfied, where 1, Q2, @ are the mesh functions approximating the functions P; , P7, 0 


respectively: 
IP:—qill-s-=O(h?),  —|Q—Oll--=O(h’). 


It is easy to see [4] , that in searching for the function Y there is then obtained an equation of the 


form 
AY=90, O=(q:, Q2, 0). (13) 


To obtain the explicit form of the first equation of the system (13) at the point 2,0, it 
is sufficient to put in (12) 
(n, §) oe (6; (Z9--2z) ? 0, 0) ’ 


bins 0, LF Xo, 
8:(%o — z -{ ijk, t= 2%, 


if xp is at a distance from Y; greater than 2h;; otherwise 6,(x)—x) is put equal to 1/h,h, 
at points Y closest to x9. The remaining equations of the system (13) are obtained similarly. 


We note that from the difference scheme (13) we obtain as a special case the difference scheme 
for the plane problem of the theory of elasticity [8] (for 6 = 0, vy = 0) and that for the problem of 
small deflections of the plate (for y=0, p=0). 
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3. Investigation of the solvability of the difference scheme 


In the investigation of the existence and uniqueness of the solution of the difference scheme 
we require the following auxiliary results. 


Lemma 4 
For any mesh function pv satisfying conditions (11) the following inequality holds: 
lvllacScyllvll2, cy=const. (14) 


Proof. Let ¥ be a mesh function defined on @ and equal to zeroon y+. Then, using 
the difference analog of the embedding theorem from W> 1) into Lg [12], we obtain || ].<c, [5 |]2. 


Now let veH; we put v=v(x) for z=o, J=0, rE. It is easy to see that ||Fl|.=Ilvllas, 
|Fllz<|!v]]2, consequently (14) is satisfied. 


Prior estimates similar to (4), (6) hold for the difference scheme (18). More precisely the 
following lemma holds. 


Lemma 5 


For any 5,, 5 the estimate 


e*a? 4 3—v \ 2 
lull << (=) “toll + Hels 





2 i— 


holds for the solution of problem (13). If the condition 





lpli-sS 


4 (= 
3-Vv 


ca 


is also satisfied, then 
lv |]2’<K (8) (Ipll-?+16l]-27), 


where K (5) =max (8/6 (14—v), 1/67), 


1 : eee ng ae 
vs Gir TT QVEqrEoe + ap? + amen ah. 
4 > 
: 


~ 


2 1 
< l|_.? + — ||O|_.”. 
7 llll-s 36 oll. 


—_— Vv 
The proof of inequalities (15), (17), (18) is very similar to the proof of the corresponding 
inequalities (4), (6), (8). 


Theorem 3 


Let condition (16) be satisfied. Then the difference scheme (13) has at least one solution 
for any @. 
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Proof. it is easy to see that the difference scheme can be written as a system of two operator 
equations 


A\yy=F (v, ®), (19) 


A,(v, y) =0, (20) 


where A, is a linear operator. By (15), Eq. (19) is uniquely solvable for y for any v, y. Therefore, 
system (19), (20) is equivalent to the equation 


A2(v, A,'F(v, p)) =8. (21) 
We now show that an R>O, existssuch that (A.(v, A,-'F(v, @))—0, v) >0 for 


‘|[vl|,=R. Then, by a well-known topological lemma (see, for example, [10], p. 66), Eq. (24) will 
have at least one solution. It is easy to see that 


I] =(A,(v, A,'F (v, p)), v) — (0, v) = (A2(v, Ay *F (uv, ) ), v) 


a" ce, 
— (6, v) + (Ay, y)—(F (v,@),y) = 12 lvl." 


4 2 2 iv 2 
+ (eu + Qveir€er + bar + —s te).t) —@ny) 


_ 
r 


— (G2, ¥2) — (8, v). 
Now using the estimate (15), we obtain 


1> jolt — 2g siya 10 1-allole > 0 


2 2 4 2 4 2 
lol? > (<——Ilgl- += loll). 


Theorem 4 


Let condition (16) be satisfied and 


2 


4 2 ‘la a 2 
Bese my 2 oh a =< 
(= lls lel ) (= 


4—v ~ 42¢,? 





Then the difference scheme has a unique solution. 


The proof of this theorem is exactly similar to the proof of Theorem 2. 


4. Estimation of the rate of convergence of the difference scheme 


In investigating the convergence of the solution of the difference scheme (13) to the solution 
of problem (1), (2) we will suppose that the functions uw, u are continuously differentiable four 
times in the domain Q, the function w is six times continuously differentiable in some closed region 


Q=2. 
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We will require the following auxiliary results. 
Lemma 6 
Let the vector functions (y, v), (y“, uv‘) satisfy the equations 
‘Ay=F(v, @); 


Ay =F (v™, gp). 


ly—yP SMO (Mlle + llo lle) lo—v lat llg—g Il-1].- 


Here and below we will denote by M, constants independent of the mesh step, possibly 
different. 


Proof. We subtract Eq. (24) term by term from (23) and multiply both sides of the resulting 
equation scalarly by y—y‘') Then, by the definition of the operator A, , we obtain 


1 
a a ( (€,,+ve2—€), —vex) 0,, (yi—y:) 


+(e2,+ve—er —vetr) Ons(Ya-Ye ) 


: 
Xp) Oni.) +3,.2-W!”)) 1) 


Tr 


=(9.-9. 7, y:—ys) +(@e—@: 2-2). 


Now collecting on the left side of the last equation the terms containing only y, y“), and 
using the Cauchy inequality, we obtain 


4 
MY (Gen Qi—ye?)) 24200 (YW) O-nYr-Ye) 
- 


1—v 26 
+ (B-rs(ys—ys?))? + (8-n(Yi-Ws?) +9--n (Ys! )), 1) is 


<M ((llvlla+llv ll) lo—v™ llaFllp—q |-1) ly—y Ils. 


Taking the lower bound of the positive-definite form on the left side of the inequality (26), 
we obtain (25). The lemma is proved. 


Lemma 7 


Let 
Ay=9, Ay®=0", 


1 9 ‘ 9 1—v a - 
|; y (e,2+2vetar tee + 5 Pe ) 


~ 
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(cont'd) 


[|w—w' ||.?<M (6-0 ||_.?+ llp—q |l-ally—y Il,). 
The proof is exactly similar to the proof of Lemma 3. 


Lemma 8 


Let (u, w) be a solution of problem (1), (2), satisfying the smoothness conditions formulated 
above. A function f(x, x2) exists such that 
i, 24 
Of 
acevo OR), = aaa, tac. 
Ox, Ox." 
Proof. Following [11, 12], we will construct the function in the form 


f (21. Le) =f (4, 2) +f (xy, 22), 


f (a4, £2) =o (22) 22 +a, (22) 2? +a2 (22) 2,+a3(22). 


We define the coefficients a,(z2), k=O, 1, 2, 3, from conditions (28) fori = 1. Then, by the 
boundary conditions (2), the estimate d’a,/dz.’=O(h’), j=0, 1,...,6. holds for the 
coefficients a(x) and their derivatives. We will seek the function f(?)(x,, x ) in the form 


f'? (24, 22) =o (21) 22° +, (2,) 222 +b2(2,) r2+bs(2,), 
where we determine the coefficients b, (x, ) from the condition 
f (1, %2)=w (a1, 2) — f" (a1, Xa), 
fe (24, £2) — Ww, (24, L2) ary fe (1, L2) 
for (.r,, 22) 2. 


It is easy to verify that the estimates d’b,/dr=O(h*), j= 6 will then hold for 
the coefficients. 


Therefore, the function f(x, ,* ) and all its derivatives will be of order O(h?). 


We show that f(x;,*2) satisfies the boundary conditions (28). 


The boundary conditions for x = 0, x» = 1 are satisfied by (29). We also note that 
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Indeed, by the conditions imposed on f!), 


(2) a) 
Is, Xy=h i, = 
ag, PD he xiun—he 


Consequently, 
f?(h, 22) =f" (—h, 22), 


The corresponding equations for x; = 1 are verified similarly. The lemma is proved. 


Theorem 5 


Let conditions (16), (22) be satisfied. Then the solution of the difference scheme (13) 
converges to the solution of problem (1), (2) at the rate O(h2): 


lly--w]|,+]]w—v]|2<Mh’, h?=h,?+h,’. 


Proof. Using the smoothness conditions for the functions (u,@), Z=w—f, asin 
[4, 5] we can obtain 


A((u, ®), (n, &))=(@s, 11) F (G2, N2) + (9, §) 
+ (%p;, N.’ F (12, No) + (Po, 5) 


for any vector function (ny, &) =H. 
Here (\, Wg) is the approximation error: 
hyll-:=O(h*), II poll_-2=O(h’). 


We now note that by condition(16)a 6,26>0, can be found such that for all 
h<xhyp, h,>OU 


- 4 — "lo 
lo+vll_.<— -( ~) a. 


3-v 


12c,’ 


Now using Lemma 5, conditions (16) and (22), we obtain 


| 9 ~ ~ 1 2 ~ 
oat \ ( 4r°(u, @) +2ve,,(u, ©) €2-(u, ©) +e2," (u, @) 
‘ a 


an 1) <a Wet ellaa tom erbellas 
+ y.(u, ©), <=— roll-2?+ D Ye 
aka Waele na A 8 Silat 


— r 


a? ie, 4 
< ( -6| 5>0. 


12c,* \3—v 


By (18) a similar inequality is satisfied for (y, v) also. Then using Lemma 7, we obtain 


\|@—vI]2<M (lrpoll-2°+llpll-ally—all.), 
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whence, by the inequalities (25), (31), we have 
|w—v||2=O(h’), |w—v|l2<|lflle+ ]@—vii.=O(h’). 


Applying once more the inequality (25), we finally obtain 


lly—u||,=O(h*) . 


The theorem is proved. 


For the numerical realization of the difference scheme (11)—(13) we can use an iterative 
process of the form 


A,y"*'=F (v", p), A?v"t!=A*v"—1(A2(v", y")—9), 


The iterative process (32) presupposes at each step the solution of difference schemes for the 
plane problem of the theory of elasticity and of the biharmonic equation, which can be carried out, 
for example, by known iterative methods [13]. 


The convergence of the method (32) for conditions close to (16), (22), can be investigated 
by the method described in [14, 15]. 


The authors sincerely thank A. D. Lyashko for suggesting the problem and for his continued 
interest. 


Translated byJ. Berry. 
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NUMERICAL SOLUTION OF THE TWO-DIMENSIONAL PROBLEM OF 
SHOCK WAVE PROPAGATION IN OUTER SPACE* 


L. V. SHIDLOVSKAYA 
Moscow 


(Received 1 December 1975; revised 24 March 1976) 


A METHOD for the numerical solution of the non-stationary two-dimensional problem of the 
propagation relative to a moving interplanetary medium of a perturbation caused by the emission 
of finite energy within the limits of a section of a cone, simulating the chromospheric region of the 
sun occupied by the flare is discussed. The effect of solar gravitation and of the radial component 
of the magnetic field strength on the motion of the gas is taken into account. 


Introduction 


The data on the observation of chromospheric solar flares accumulated up to the present time 
and the measurements of the parameters of the shock waves arising from these flares and propagated 
in interstellar space, draw attention to topics connected with the propagation of perturbations in 
the solar wind. The fundamental results of the analytic and numerical investigations of the dynamic 
processes in the interplanetary medium, contained in [1—6] , were obtained from a study of the 
one-dimensional flow of a plasma on the assumption of spherical symmetry about the sun. However, 
a continuously increasing amount of experimental data testifies to the fact that the shock wave front 
close the the earth’s orbit does not possess spherical symmetry, and accordingly more and more 
complex models of the propagation of perturbations in the solar wind have had to be developed. It 
is known [1,7] that solar chromospheric flares arise and take place in a comparatively small 
volume: the area of a flare occupies about 1% of the area of the solar disk, the height of the flare 
layer is of the order of 104 km. The radiation generated by solar flares has a small duration: from 
0.5 to 3 hours. Sometimes the radiation is far from radial, since some flares cause geographical effects, 
although they take place close to the edge of the solar disk. All this suggests that the radiation of 
particles from the surface of the sun may occur within the limits of some cone, at times possessing 
a fairly considerable solid angle. The two-dimensional problem of a blast in a moving medium in 
the hydrodynamic approximation was considered in [8]. 


In this paper we present a two-dimensional magnetohydrodynamic model of the motion of a 
plasma, based on a continuous emission of large energies and masses of incandescent gases in the course 
of 5 X 102 to 5 X 103 sec into an interplanetary medium moving along a radius from the sun. 
Allowance is made for the interaction of the solar wind with the interplanetary magnetic field, whose 
source is the total magnetic field of the sun. In view of the extreme complexity of the phenomena 
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considered it is assumed that the azimuthal component of the magnetic field strength vector 
exerts no significant effect on the motion of the interplanetary gas, although the converse effect 
of the dynamic processes on the perturbation of the interplanetary magnetic field may be fairly 
strong. This constraint is permissible since the experimental data confirm the fact that at distances 
from the sun right up to the earth’s orbit the radial component of the magnetic field predominates 
over the remaining components. Analytic investigations are difficult because of the complex 
mathematical formulation (a boundary value problem for a system of non-linear equations of a 
compressed gas with three independent variables). Therefore to obtain a solution of the problem 
posed numerical calculations were carried out by the modified non-stationary method of “‘large 


particles” [9]. 


1. Statement of the problem 


We consider the system of equations of single-fluid hydrodynamics for a completely ionized, 
non-heat conducting hydrogenous gas using the spherical coordinate system. We suppose that the 
flow is symmetrical about the polar axis, so that all the quantities are functions of the heliocentric 
radius r, the polar angle 6 and the time ¢. The polar axis is so chosen that it passes through the 
centre of the region occupied by the flare, which is a section of a cone (see Fig. 1). The equations 
include the forces connected with the pressure gradient and solar gravitation, and the effect of 
dissipative processes is disregarded. We denote by p the numerical density, T is the temperature, p 
is the pressure, u is the velocity of the flow in the radial direction, v is the tangential component 
in the direction of variation of the angle 9, and H=H, is the magnetic field strength. The 
equations of the conservation of mass momentum and energy in Eulerian coordinates have the 


form 


0 
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rsin 0 


0 
Laney ry asi sin 8) 


a 
ns eS otes aint 
meee 


rsin@ 00 


Here G is the gravitational constant, M, is the mass of the sun, F is the specific total energy (not 
taking into account the energy of the magnetic field): 


E=3kT/(m,+m_) + (u*>+v’)/2—GM,/r, (1.5) 
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FIG. 1. 


m, and m, are the masses of a proton and an electron respectively. For the specific internal energy 
I we have the expression [=3kT/(m,+m-.). The equation of state of a perfect gas is written in 
the form 


p=2pkT/(m,+m.), (1.6) 
where k is Boltzmann’s constant. 


To determine the magnetic field strength H in a medium with infinite conductivity from 
Maxwell’s equations, neglecting the displacement current, we have the following equations: 


0H/dt+[06(Hv sin 8)/00]/r sin 8=0, (1.7) 


A(r°2H) /ar=0, (1.8) 


to which in this case the whole system of electrodynamic equations essentially reduces. For the 
solution to satisfy Eq. (18), it is sufficient to require that it be satisfied by the initial data, therefore 
Eq. (1.8) will be used only for the construction of the initial parameter distribution of the gas flow. 
Therefore, system (1.1)—(1.7) is closed. 


The electrical field strength E, and the current density J can be determined from the formulas 


E.=—[UXH], J=rot H/4n, 


where U is the gas flow velocity vector. 
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The general form of system (1.1)—(1.7) can remain the same for dimensional and dimensionless 
variables, so we will use the latter, choosing as characteristic values of the plasma parameters for To 
the corresponding experimental data. The motion of the gas occurs in a domain bounded by the 
spherical surfaces r = rg and r= ry with variation of the angle 6 from 0 to 7. 


The initial steady flow of the solar wind is defined as follows. It is assumed that the quiescent 
solar wind has only a radial velocity component, that is, v=Q at the initial instant. Using this 
assumption and equating to zero the time derivatives in Eqs. (1.1)—(1.8), we obtain a system of 
ordinary differential equations whose numerical solution is constructed with the boundary values 
Ug; Po» Po and Ho specified at the point r= rg. The result of this solution gives the initial 
distribution of the hydrodynamic parameters and the magnetic field strength. 


We assume that at the instant t = 0 in the conical section rp>SrSr,+Ar, OS@0S0. a 
sudden change in the parameters of the gas flow has occurred, the new higher values of these 
parameters Uo1, Po1, Por being determined from the conditions on strong discontinuities for a 
specified initial rate of propagation of the shock wave up along the quiescent solar wind [10]. The 
energy is emitted during the time interval 7. The value of the initial angle 6, and thickness of the 
layer Ar in which the perturbation is initially concentrated can be specified arbitrarily, and in 
performing numerical calculations Ar is taken equal to half the mesh step in the radial direction. 
Therefore, at the initial instant the distribution of the parameters u, v, p, p, H of the quiescent 
solar wind and the perturbation concentrated in the section of the cone of parameters 
are specified. It is required to determine the flow of the gas at subsequent instants, until the front 
of the shock wave reaches the earth’s orbit, subject to the following boundary conditions: 


u,v, p, p, H= 
uo, 0, Po, po, Ho for r=n, 8<O0Sn, ¢20, 

=) Un, 0. Pou, Por. Mo for mSr<rmtAr, OSO0S0., t<t, 
wo, 0, Oo, Po Ao for msrsr,tAr, OS0<0.. t>t. 


The requirement of symmetry of the flow about the polar axis imposes one more boundary 
condition on the tangential component of the gas velocity: v = Ofor wWoi, 0, Po1, Por. The 
parameters of the perturbed flow change not only in space, but also in time. To construct the 
solution of the problem of the non-stationary motion of the gas we use the numerical “large 
particle” method [9] , modified somewhat due to the specific nature of the problem. 


2. Method of calculation 


We subdivide the domain of integration by a fixed Eulerian mesh into cells whose centres 
correspond to points with subscripts i, 7; the volume of each cell V;;=2a(r;)* sin 6; ArA®, where 
Ar is the step along the radius, Ar=r;,,— ;, A6@ is the angular step, and A8=0;,,—6). 


We suppose that at the instant ¢ = n At for every cell i, j the determined values of the velocity, 
density, pressure and direction of the magnetic field are known. The problem consists of finding 
these values at the instant t=(n-+1) Af. 


The calculation of each step is divided into two stages [9]. At the first (Eulerian) stage we 
neglect the flows of mass through the cell boundaries and determine intermediate values of the 
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parameters by means of the equations 
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The finite-difference equations of first-order accuracy in time and space, corresponding to this 
system, have the form 
At 
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At this stage the difference scheme is stable, as will be shown below on the basis of [9]. 


We note that another approach to the construction of the difference scheme at the first stage 
is possible, namely: instead of the equation for the total energy E we can use the equation for the 
internal energy 1. Then, however, the difference scheme will be non-conservative and to improve 
the stability of calculation at the first stage the pressure p in Eqs. (2.1)—(2.4) must be replaced 
by the term (p + q), where q is an artificial viscous pressure. If c is the local speed of sound, then 
the expression for q can be taken in the form [5] 

Pig 5 (Uig"—Uis i) [Beigy,g+A? (uis"—Ui 41,3) ] for usy">wisss, 


0 for U4;"SUi44,5, 


re a Ae — : < 
q = { Oi. (Vis"—Vi541) [Bei jay, +A (vi;"—Vijes)] for Vis" >Vi.in1, 
‘j-th 
0 for Vig" Ui 541- 


The introduction of the linear term in the expression for q in the case of weak shock waves leads 
to damping of the oscillations arising behind the shock front. The quadratic term plays a similar 
role in the consideration of strong shock waves. Experimental measurements of the mean 
velocities of shock waves at various points of interplanetary space show that in a number of cases 
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a strong shock wave may experience considerable damping; therefore on the earth’s orbit weak 


shock waves are observed. 


At the second (Lagrangian) stage we calculate the density of the mass flow in the motion of the 
gas through the cell boundaries. Let M?,, , be the flow of mass through the boundary (i + 4) 
of the cell i, j in the time At, and M?;,,, the similar flow through the boundary (j + 4): 
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The third stage of the calculations consists of determining the final values of the flow 
parameters p, X = (u, v, E) corresponding to the instant t=(n+1) At, on the basis of the laws 
of conservation of mass, momentum and energy for every cell by the formulas 


ie Mt Me Me Ms 0.5) 


=0; T 
ri 
V i 





ij 


oy n n 1 nr n n , n 
mee Pij" Aig" X ins My A Xi 5-1 Mi jy, — Xi" (Minn gs tM: 4%) (2 6) 





n+ ~n+itt7 
0 p* Vi; 


tj 
The specific internal energy and temperature are determined from the relations, true for any 


instant of time: 
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To determine the magnetic field strength at the instant t=(n+1)At itis proposed to use 
the following finite-difference equation corresponding to the induction equation (1.7): 
H;*= "At Ba en sin 03.4 —-A i 5—1925-% sin 0;-»,)/T; sin 0. AO. (2.8) 
The difference scheme is conservative as a whole, since at each stage the conservation laws 
are used, and the total variation of mass, momentum and energy after Af equals the sum of their 
variations in the first and third stages. 


The difference equations in the form (2.5)—(2.8) are true only for the internal cells of the 
domain considered. We choose the external boundary ry in such a way that it coincides with the 
centre of the N-th cell; it is then required simply to determine the flow variables in the fictitious 
external cell, equal in value to the internal cell adjacent to it. On the boundary rp the values of 
all the hydrodynamic parameters are defined at all instants of time and do not require to be defined. 
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3. Viscosity effects and the stability condition 


The magnetohydrodynamic equations for an inviscid gas are regarded as the fundamental 
equations for describing the motion of the plasma. However viscous effects arise due to the 
introduction into the equations of an artificial viscous pressure q and the presence of scheme 
viscosity, arising from the replacement of the exact differential equations by finite-difference 
approximations. Topics connected with scheme viscosity are studied in [9]. Expanding the 
difference operators of the scheme of the problem considered at all stages in Taylor series, we obtain 
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Similar investigations can be carried out for Eqs. (1.3)—(1.4) also, producing qualitatively 
similar results. 


Analyzing Eqs. (3.1), we easily notice that even for g = 0 terms with €, n are present, which 
are similar to the dissipative terms of the Navier-Stokes equation; therefore blurring of the shock 
wave front occurs in the case g = 0 also. 


As stability conditions for Eqs. (1.1)—(1.7) we use the usual Courant condition (or the speed 
of sound condition), consisting of the assertion that the time steps must be less than the time 
interval in which the sound signal reaches the boundaries of the adjacent cells. Taking into account 
the fact that the perturbation propagates along the solar wind whose velocity is non-zero, and the 
fact that the perturbed motion takes place along two directions in space, we propose to use the 
stability condition in the following form: 


At=B min [Ar/(c+|u|), rA0/(c+|v]) 1, (3.2) 


where <1 is some coefficient, and c=( p/p)” for gases. 


We show that with the estimate (3.2) the original system of equations (1.1)—(1.7) is stable 
in the sense of [9], that is, the sign of the coefficients a; of the dissipative terms of the differential 
approximation, containing second-order partial derivatives with respect to the spatial variables, is 
positive. Expanding the difference equations at all three stages in Taylor series accurate to terms 
of the first order in time and of the second order in space inclusive, we obtain 
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where A,, Az, Az, A;, A; are terms of the first differential approximation proportional to 
Ar, rA@ and containing first derivatives. For the calculations we chose 


dr=0.01, 66=0.1, 5¢=0.0008, 0.14<r<1.1. 


To the motion of the shock wave relative to the gas there correspond the estimates 
|A(ur*) /dr| 5r<0.2, |A(v sin 6)/60|56<0.3, 


| 6p/dr | r<3, | dp/00| 586<0.1. 


It is easy to see that the coefficients of the second derivatives 
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will be positive, that is, this system of difference equations is stable. 


4. Results of the calculations 


As mentioned above we put the external boundary of the domain considered ry equal to 
1.1 a.u., and the internal boundary rp equal to 0.1 a.u. This makes it possible to consider dynamic 
processes in the interplanetary medium outside the regions of solar chromospheric flares, without 
going into the complex nature of their formation. As the scale for measuring distances we choose 
the distance from the sun to the earth’s orbit, that is, r,=4 a.u.=1.495X10%km, as character- 
istic parameters of the flow we take the values up, Tp, Pg and Ho for r= rg, corresponding to the 
data of observations: uy== 339 km/sec, 7>=7.9X10° °K, p>=1.26X10*cm-*, Hp>=4X10-* gauss. 


Numerical calculations were performed for a flare characterized by the energy 0.6 X 102° erg, 
velocity of the shock wave at the initial instant in the radial direction up = 1500 km/sec, 
semi-vertical angle of the cone in which the initial perturbation was concentrated, 0, = 23°, initial 
thickness of the perturbed layer Ar = 0.01 a.u. The time interval in which the main energy of the 
flare was emitted was put equal to 0.1 hour. In performing the numerical calculations it was 
considered that the radial step h = 0.005 a.u., the angular step AO=5°45’, the angle 0 being 
measured from a line connecting the centres of the sun and the earth. 


To estimate the accuracy of the numerical calculations a trial calculation was performed with 
the radial step doubled. The relative variation of the results in the main parameters desired turned 
out not to exceed 14%, which testifies to an acceptable degree of accuracy. 


Using Eq. (1.8) and Parker’s model of the interplanetary magnetic field [1], we can calculate 
the parameters of the unperturbed magnetic field: H,,=Ho(ro/r)?, Ho,=0, Ho=Horo’Q cos 6,. 
/u(r)r. Here 6, is the angle between the polar axis and the plane of the ecliptic of the sun, 
(2=2.7X10~-* sec ~' is the angular velocity of the sun’s rotation, and u(r) is the velocity of the 
quiescent solar wind. 


Considering that the lines of flow deviate little from the radial direction, using the freezing-in 
integral and the law of conservation of mass, it is easy to obtain an approximate expression for the 
azimuthal component of the intensity vector of the perturbed magnetic field: 


H,(r, 8) =H,.9(r, 8)/pe(r, 8), 


where p(r, 8), pc(r, 9) are respectively the densities of the perturbed and quiescent solar 
wind. The total strength of the interplanetary magnetic field H is given by H=(H,°+H,’)". 


Figure 2 shows the shape and leading front of the shock wave as it propagates through the 
interplanetary medium. The shock wave front for the case where the magnetohydrodynamic effects 
are taken into account is shown by the continuous curves and reaches 1 a.u. for @ = 0° after 60.4 
hours. The dashed curves correspond to the shock wave front when the effect of the magnetic 
field is neglected. It is seen from Fig. 2 that the magnetic field delays the shock wave, the arrival 
time of its leading front at the earth’s orbit for 6 = 0° being increased by 5 hours for the version 
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considered. It is easy to see that the shock wave does not penetrate into the exterior of the region 
bounded by a cone with semi-vertical angle 6 = 46°. 





PR. 2. 


Position of the leading front of the shock wave: 1 is for t= 10.5 hours, 2 is for t = 21 hours, 3 is for 
t = 31.5 hours, 4 is for t = 42 hours, 5 is for, t = 52.5 hours. 











FIG. 3. 


Velocities of the perturbed flow: 1 is for 6 = 0°, 2 is for 6 = 23°, 
3 is for 0 = 46°. 
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FIG. 4. 


Densities of the perturbed flow: 1 is for 6 = 0°, 2 is for @ = 23°, 
3 is for 6 = 46°. 


The arrival time of the shock wave front at points of observation situated at the same distance 
from the sun, but at different angles to the axis of symmetry of the flare, will be different, as will 
also be the values of the velocities recorded (see Fig. 3) and the densities of the perturbed flow 
(see Fig. 4). The dashed curves here correspond to the distribution of the flow parameters at the 
instant f; = 22 hours, the continuous curves to the distribution of the velocity and density of the 
gas at the instant fr = 36 hours at the same angles. 
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FIG. 5. 


Variation of the tangential component v: 1 is for 0 = 23°, 2 is for 0 = 46°. 
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FIG. 6. 


Distribution of pressure as a function of distance: 1 is for t = 23.5 hours, 
2 is for t = 37.4 hours. 
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MG. 7. 


6=0°,0, =15°. 


The curves with the symbol c in Figs. 3—7 show the distribution of the magnetohydrodynamic 
parameters in the stationary flow, that is, in the quiescent solar wind. 


Figure 5 shows how the tangential component of the flow velocity v varies as a function of 
the distance r and the angle 6 for two instants of time: t; = 22 hours (dashed curves) and fy = 
36 hours (continuous curves). It is interesting to see that the velocity v attains values up to 
70 km/sec at distances of 0.1 to 0.5 a.u. from the region occupied by the flare, but it makes no 
significant contribution to the total velocity of the flow at great distances from the sun (of the order 
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0.7 to 1 a.u.). It should also be mentioned that because of its three-dimensional nature the motion 
of the surface of the flow in the perturbed region has an extremely complex form, and in particular 
attention may be directed to the occurrence of zones with negative values of the tangential velocity 


component. 


Figure 6 gives a comparison of the pressure distribution of the gas as a function of the 
distance r for @ = 0° for two instants in the case where the magnetohydrodynamic effects are taken 
into account (continuous curves) and in the case where the effect of the magnetic field on the 
motion of the gas is neglected (dashed curves). Calculations did not reveal a significant effect of the 
magnetic field on the amplitude of the pressure, though the sum of the hydrodynamic and magnetic 
pressures may give an increase above the pressure ignoring the magnetic field of up to 10%. 


The energy of the flare was calculated as indicated in [6]. 


Figure 7 shows the total strength of the magnetic field as a function of the distance r at the 
instant ¢; = 36 hours (Fig. 7, a) and of the time ¢ measured from the beginning of the solar flare, if 
the observations are made from the point r= 7, (Fig. 7, b). It is interesting to note, that the nature 
of the dependence of H on f¢ is the same as that observed in reality by means of magnetometers 
mounted on cosmic equipment and satellites. On the passage of a shock wave the magnetometers 
register a sharp jump of the magnetic field strength, corresponding to the sudden beginning of a 
geomagnetic storm on the earth, and then a sharp fall of intensity in comparison with the magnetic 
field intensity of the quiescent solar wind, which corresponds to the so-called principal phase of a 


terrestrial geomagnetic storm. 


It must be borne in mind that all the results relate to calculations for flares with energies of 
the order of 103! ergs. The possibility is not excluded that for higher energies some of these 


results will require correction. 


As a result of the numerical calculations performed the following parameter values of the 
quiescent solar wind at the earth’s orbit were obtained: velocity of the wind u = 417 km/sec, 
density p = 10 cm~3, and strength of the total magnetic field H = 5.54 X 10-5 gauss. After the 
passage of the front of the shock wave through a point situated at a distance 1 a.u. from the sun 
for @ = 0°, which occurs 60.4 hours after the beginning of the development of the flare, the 
parameters of the perturbed solar wind acquire the following values: u = 470 km/sec, p = 19 cm~3, 
and H = 7.75 X 1075 gauss. 


The data of observations in interplanetary space, accumulated in recent decades, testify to the 
fact that at the earth’s orbit the velocity and density of the quiescent solar wind are respectively 
of the order of 400 km/sec and 10 cm~3, the magnetic field intensity varies within the limits 10-5 
to 10~® gauss, that the perturbation arrives at the earth’s orbit 40 to 70 hours after the beginning 
of the solar flare, the velocity of the perturbed solar wind at the earth’s orbit is of the order of 420 
to 600 km/sec, and the density of the gas on the passage of the shock wave is increased by a factor 
of 2 to 3, as is also the magnetic field intensity. The results of the numerical solution discussed 
above agree with these data with an acceptable degree of accuracy, which permits us to regard the 
model of the propagation of the perturbation from a solar flare through the interplanetary gas, 
proposed in this paper, as completely satisfactory. 


The numerical calculations were performed on the BESM-6 computer. 
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The author thanks V. P. Korobeinkova for his interest. 
Translated by J. Berry. 


REFERENCES 


PARKER, E. Dynamic processes in the interplanetary medium (Dinamicheskie protsessy v mezhdyplanetnoi 
srede), ‘‘Mir”, Moscow, 1965. 


KOROBEINIKOV, V. P. and NIKOLAEV, Yu. M. Shock waves in the configuration of magnetic fields 
in interplanetary space. Cosmic Electrodynamics, 3, 1, 12—32, 1972. 


SIMON, M. and AXFORD, W. J. Shock waves in interplanetary medium. Planet. Space Sci, 14, 9, 
901-908, 1966. 


HUNDHAUSEN, A. J. and GENTRY, R. A. Numerical simulation of flare-generated disturbances in solar 
wind. J. Geophys. Rev., 74, 11, 2908-2919, 1969. 


KOROBEINIKOV, V. P. and SHIDLOVSKAYA, L. V. Numerical solution of problems of a flare in a moving 
gas. In: Numerical methods of the mechanics of a continuous medium (Chisl. metody mekhan. 
sploshnoi sredy). Inf. byull. Vol. 6, No. 4, “Nauka”, Novosibirsk, 56—68, 1975. 


SHIDLOVSKAYA, L. V. On the propagation of perturbations in interplanetary plasma caused by solar flares. 
Dokl. Akad. Nauk SSSR, 225, 2, 39-43, 1975. 


BRANDT, J. and HODGE, P. Solar System Astrophysics (Astrofizika solnechnoi sistemy), “Mir”, Moscow, 
1967. 


De YOUNG, D. S. and HUNDHAUSEN, A. J. Two-dimensional simulation of flare-associated disturbances 
in the solar wind. J. Geophys. Res., 76, 10, 2245—2253, 1971. 


BELOTSERKOVSKII, O. M. and DAVYDOV, Yu. M. A non-stationary “‘coarse particle” method for 
gas-dynamical computations. Zh. vychisl. Mat. mat. Fiz., 11, 1, 176—207, 1971. 


10. KULIKOVSKII, A. G. and LYUBIMOV, G. A. Magnetohydrodynamics (Magnitnaya gidrodinamika), 
Fizmatgiz, Moscow, 1962. 


CONVERGENCE OF THE ITERATIVE PROCESS FOR 
THE QUASILINEAR HEAT TRANSFER EQUATION* 


V. I. MASLYANKIN 
Moscow 


(Received 10 May 1976) 


THE possibility of a numerical solution of the quasilinear heat-transfer equation by means of 
through-calculation difference schemes is discussed. It is shown that the convergence of the iterative 
process depends on the choice of the interpolation formula for the heat transfer coefficient. 
Approximate estimates of the regions of applicability of these formulas and the results of numerical 
calculations are presented. 


1. Introduction 


This paper is devoted to the numerical solution of the heat-conduction equation 


Op (u) Se oe 
at =k aI 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 209-216, 1977. 
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by means of homogeneous through-calculation difference schemes. As shown in [1], such schemes 
can be used to calculate generalized solutions of the temperature-wave type. The purpose of the 
paper is to analyze some interpolation formulas for the thermal conductivity K(u), occurring in the 


difference operator approximating the right side of (1). 


In sections 3 and 4 it is shown that on the introduction of an iterative process for solving 
the difference system of equations corresponding to (1), constraints arise on the applicability of 
some interpolation formulas for the thermal conductivity. But if these constraints are not applied, 
then the numerical solution may differ qualitatively from the actual solution. The derivatives made 
in sections 3 and 4 are confirmed by the numerical calculations presented in section 5. 


We consider a heat wave propagating against a zero background. Let 


K (0) =@ (0) =0, K(u)>0, gp’ (u)>0 for u>0, 


lim = o | = (), 


u—>0 


In this case the wave front propagates with finite velocity. Denoting the position of the front 
at the instant ¢ by £(r) (Fig. 1), we can obtain the following expressions for the velocity of the 


front [1]: 


| 








dé K 
> = — lim | =. (2) 


dt p(u) dx 


As examples in the calculations the analytic solution of Eq. (1) was used, which represents 
a running wave against a zero background (the thermal conductivity is taken in the form K (u) =x»u’, 


and we assume ¢ (uw) =u): 


[ocx ~' (ct+a,—z) ]*”, L<2,+ct, 
u(t, x)= 
0, z,+ct<z, 
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where c is the velocity of motion of all the points of the profile, c = const. In the calculations we 
put x, =0. 

2. Scheme of the calculation. Interpolation formulas 


for the thermal conductivity 


We suppose that A (uw) =x,u°, in which case (1) assumes the form 


¥% u° — 


Ox 


Op (u) Ai 0 | 
at Ox 


The boundary conditions for (4) have the form 


u(t,0)=p(t), u(t, 1) =a (t). 


Equation (4) is replaced by an implicit homogeneous difference scheme which is a modification 
of the scheme presented in [1]: 


9 (V1) — (Us) =Aiss (Vig —¥s) —A (V—-Vi-+), (5) 


where A,;=(t/h) Ki, and for the K; we use one of the three interpolation formulas described in 
[2] (the general method enabling these formulas to be obtained has been described previously in 


[3—6]): 


 . 2K (v-1)K (vi) 
' ALK (v,-,) +K (va ] 





ty 


es = [K(v.-1) +K(v,) ]6,, 
(, 


where 6; = 1 fori=1,2,...,N—1,0;= 0.5 for i = 0, NV. Quantities without a “halo” are computed 
at the step / + 1, and quantities with a “halo” at the step /. The mesh is assumed to be uniform: 
xu=ih, 0OSi<N, t’=jt. 

The distinction from [1] consists of the fact that v; refers to the half-integral point 7 + 2. We 
assume that v—»,=UVo, Uni,=Ux. For this scheme stability occurs for any step 7. 


The system of equations (5) fori=0.1,...,N— 1 at each step + 1 is solved as in [1] by an 
iterative method. The resulting system of linear equations is solved by pivotal condensation (see, 


for example, [2] ): 





V. I. Maslyankin 


(s+1] [s+1] 
Vi =HiniVier tBigs, 


where @;,, and 6;,, are pivotal coefficients and s is the number of the iteration. 
; (0) 
As the zeroth iteration values from the preceding step Uv; =U. are chosen. 


The condition for ending the iteration has the form 


(sti) fa) [s+1] 
max |v, —v; |<ete,v,; . (10) 


Omi N-t1 


In all the calculations we assumed €, = 0. For each example the actual number of iterations 
v/ and the so-called “Courant relation” 


y=max[K(u) t/h*], (11) 


characterizing the size of the time step were considered. 


3. The difference running wave 


As in [1] we will describe as difference running waves for Eq. (5) all the solutions of the equation 


a B 
>. [A igs (Vigi—vi) +. @ (Vien) =C1, (12) 
k=1 k=1 


where B>1, a>1 are integers. The solutions of Eq. (12) preserve their profile, shifting after 
a steps (in /) through 8 computing intervals to the right, that is, v,’=viz{. Therefore, the velocity 
of motion of all the points of the profile is constant and equals c=8h/ (at). 


The solution of Eq. (12), being a wave moving relative to a quasi-zero background (see Fig. 1), 
can be selected as follows. Let &,=2i,4», be the position of the difference front (we recall that the 
value vi, refers to the point ip + 4%). We will consider that Viz.= ... =Vi-s= nZO0. 


Putting Vi, the value of vjg can be determined from Eq. (2), assuming that the equation 


holds at the point x,.; (here and below we put @ (uw) =1) : 


* (h/t) A ions (Viogs—Vie) a 


0.5 (Vig i FVi, 





From this we obtain 
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roe {1-2 +o (“) |} =0.5 Ba. 
Vio 


Vio 


We will find all the values of v; for j<j, from the equation 


a B 
bs [Ais (Vigr—U;) ds sp Vign=Ci, 
h=et k= 


where c,=0.5 B(n—v,,). 


We now consider the various forms of the interpolation formulas for the coefficient 
Ki, Ai=(t/h) Ki, putting K (uw) =x0u’, (uw) =u. 


For (6), assuming on/vi,<<1, we have 


v4=2(20)-/"B.u 1+0(—)] : 


Vig 
where %;, is the corresponding value on the analytic running wave (3). 


For (8) we have 
o[eH(2) > 
Vin 


v,=07'/°F;,[1+0 (y/vi) }. 


or, considering that 


For formula (7) for o21 we have 


Per.. Sam 
1—7,,°/40n"° — 


Vio == 

Since v;,>n>0, then n> (40) ~'’°9,,. 

It is obvious from this that the “background” value of the temperature is at least comparable 
with the value at the front of the difference running wave. It is impossible to construct a difference 
running wave on the zeroth background for the interpolation formula (7). 

We also note that since for large o the expression o'/’-1, then for the difference running 
wave defined by (6), the value at the front will be twice the actual value for large values of o. 


4. Convergence of the iterative process 


We consider a temperature wave running along a quasi-zero background. Let 
p(u) =u, K(u) =xou’, o2 1. 


Let 2:1 be the position of the difference front. We will assume that v;,4;=... 
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where Vi,>Vis1. In the subsequent analysis it is assumed that V;,,, is any sufficiently small 
background value. The analysis remains true for the case v;,,,;=0. 


We consider a coefficient of the form (7), then 


Ka 2K ie) K Wiss) 
we ALK (0) +K Woes) ] 





Since v,;,>vVii, and oi, then 
2 


Kis = h 


K (v4.41) —0 [Kea] ; 


Neglecting terms of the second order of smallness, from the pivotal formulas we obtain 


~ 


t 


h? K ists 


Bit2=Viogit Biogi2 


tT 


h® Rivet 


AV ing t=Ving 1 — Ving 1 =| ey - 


It is obvious (see (9)) that B;x<vo=Vmax. We put Biy1=Umax. Since the calculation 
is continued until condition (10) is satisfied, in order that the scheme be “‘sensitive’’, that the value 
of vis, be varied,it is necessary that Av;,,; be greater than € (that is, so that condition (10) 
will not be satisfied). Hence 
oe 


Umax a K io¢it> €. 
h° 


Using expression (11), we obtain 


24 (Vig+1/Umax) "> @/ Vans, 


& ‘¢ ’ 
Vioet > ( 5 dmax + (13) 

From this it is obvious that the background cannot be a zero one (for K; defined by formula 
(7)), moreover, the background value of the temperature cannot be less than some value, not always 
sufficiently small. It is interesting to note that on making the net finer, that is, decreasing the 
Courant ratio x, we will have to increase the background value of the temperature. 


We now consider a coefficient of the form (6). For it the background value of the temperature 
is unimportant, since in this case 


However, for a sufficiently great degree the o in the thermal conductivity Ais: may be 
arbitrarily small. In this case, as before, we can write down the condition for which the scheme 
will work correctly: 

Vax K,2-*>e, 
h* 
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or, using (11), we obtain 


log. (UmaxX/ E ) (14) 
1— logs (Vi/ Vmax) 





As in the case described above (that is, for formula (7)), the smaller x, that is, the finer the mesh, 
the smaller the range of applicability of formula (6). 


However condition (14) is too strong. If it is not satisfied, then thereby a value v,,.;=U» 
is defined at the point i,:»,. In this case the heating process is no longer described by Eqs. (3), 
the temperature v;, atthe point Zi,:¥, increases, and, substituting the limiting value 


from (14) we obtain 


Umax 
=. (15) 


€ 


o < log, 


If condition (15) is satisfied then as the value of vi, increases an instant may arrive when 
the condition of convergence of the iterations (10) for the points <;,,», is violated. Asa result 
a true numerical solution may be obtained. The corresponding numerical calculation is described 


below. 


5. Examples of the numerical calculations 


1. The following parameters of the calculation were chosen: e=0.001, o=2, %o= 0.5, 
c=). Then from (3) the boundary conditions have the form u(t, 0)=10t", u(t, xy) =0. 
The initial conditions for tg = 0.1 were chosen by solving (3). The calculation was performed up 
to t, = 0.2 with step t=2- 10-*. The mesh was chosen sufficiently fine: h = 0.02, N = 50. In this 
case the Courant ratio (11) will be x = 5.0. From the inequality (13) we obtain that the background 
value of the temperature must be greater than v*~0.021. 


In the calculations it was chosen equal to 10~>. The analytic solution (continuous curves) 
and the results of the calculation (crosses) by formula (6) are shown in Fig. 2. Everywhere, apart 
from several nodes close to the front, the deviation of the calculated from the exact solution does 
not exceed 0.002, the number of iterations v<3. Calculation by formula (8) gives practically 


the same results. 





i ‘ 
: oF 


\ 


\ 
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Solution: 1 is for tg = 0.10, 2 is for t; = 0.12, 3 is for ty = 0.16, 4 is for t, = 0.20; 








V. I. Maslyankin 














FIG. 3. 


Solution: 1 is for tg = 0.10, 2 is for ty = 0.12, 3 is for ty = 0.16, 4 is for % = 0.20; 




















Ot 


FIG. 4. 


The results of the same calculation by formula (7) are shown in Fig. 3. Since the background 
value of the temperature is much less than y*, then by the previous analysis the convergence condition 
(10) is satisfied at the node closest to the front. Since the number of iterations v<3,_ the value 
of the temperature at this node is practically unchanged and the front remains immobile. 


The spatial step was changed in the calculations, the steps h = 0.04 and 0.01 were chosen. The 
results of these calculations are practically the same as the results obtained for h = 0.02. 
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FIG. 5S. 
Solution: 1 is for tg = 0.10, 2 is for t} = 0.15, 3 is for ty = 0.20, 4 is for t, = 0.25. 


In the case h = 0.02 a calculation was performed in which the initial instant fy was so chosen 
that condition (14) was violated. It is easy to see that condition (15) is satisfied. The calculation 
was carried out using formula (6), and the numerical solution obtained is identical with that 
considered above (where fy = 0.1 was chosen). This confirmed the inferences made above concerning 


formulas (14) and (15). 


2. For a calculation illustrating the case where formula (6) does not work the following 
parameters were chosen: e=0.001, o=20, xo=!'/3, c=2. The boundary conditions from (3) 


are of the form u(t, 0) =(640-t)?, w(t, rv) =0. 


The Courant ratio is y=16, h=0.02, N=25, t=2-10-‘. The initial conditions for 
tg = 0.10 were chosen from the solution of (3). The calculation was performed up to t, = 0.25. 


By the inequality (15), we have the condition o<o*~14, which is obviously not satisfied. 


The results of the calculation by formula (6) are given in Fig. 4. The number of iterations 
v<3, therefore the value of the temperature at the front cannot be varied in practice and the 


front is immobile. 


The analytic solution (continuous curves) and the results of calculation (crosses) by formula 
(8) are shown in Fig. 5. From tg = 0.10 to t; = 0.15 the number of iterations v<11, from 
t, = 0.15 to t, = 0.25 the number of iterations v<3. Everywhere, apart from some nodes close 
to the front, the deviation of the calculated from the exact solution does not exceed 0.003. 
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6. Conclusion 


From the analysis of formulas (6)—(8) it is obvious that formula (7) is practically useless for 
calculating temperature waves propagating along a fairly small background, and for calculating 


processes with a large temperature drop. 


Formula (6) is inappropriate for large degrees (yS20) in the thermal conductivity. 
However, even in the case of real values of the exponent o (for example, o = 2) for sufficiently fine 
meshes (o=20) calculation by formula (8) is advantageous, since it requires less computer time. 
For example, the calculations carried out in section 5, subsection 1 for h = 0.02 required 30% 
more time when formula (6) was used than when (8) was used. 


For coarse meshes formula (6) is not suitable, it enables a more accurate solution to be 
obtained, and as shown by calculations, on coarse meshes it requires less computer time than formula 


(8). 


I express my thanks to A. A. Samarskii for considerable assistance and for his continued 


interest. 


Translated by J. Berry. 
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LOCAL ALGORITHMS ON YABLONSKII SCHEMES* 
A. V. KABULOV 
Moscow 
(Received 17 May 1974) 


LOCAL algorithms on information processing systems are discussed. It is shown that the concept 
of best local algorithm for the introduction of a system of neighborhoods extends to a class of 


control systems. 
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In this paper we consider local algorithms for calculating information [1] on the elements 


of control systems [2]. 


It will be shown that for the elements of such systems a system of neighborhoods can be 
introduced in such a way that the neighborhoods satisfy fundamental axioms. 


It will also be shown that for neighborhoods in control systems and predicates characterizing 
the properties of elements in systems, the concept of best local algorithm, theorems of the 
existence of the best algorithm and the property of partial ordering in the class of best algorithms 
are preserved. All these results are obtained in section 2 of the present paper. 


In section 1 the fundamental definitions of the theory of ordering systems, such as networks, 
memory, elements, subschemes and schemes are introduced. The interaction between the elements 
of ordering systems is described, and brief characteristics of their functioning are given. Section 1 


is based on [2]. 


In the exposition of the results on the theory of local algorithms it is assumed that the reader 
is familiar with the fundamental definitions and theorems of this theory in [1]. 


1. Control systems 


A strict definition of the class of information processing systems, or as the author calls them, 
control systems, was given in [2]. We consider below an essential element of control systems — 


schemes. 


Following [2] we introduce successively the definitions of networks, memory, elements and 


subscheme. 


The synthesis of these concepts gives the principle object of our investigation—schemes. 


1. Networks. Let M={a,} be a set of distinct objects a,, possessing power m. Also let 
Ey, E;, i=1,be groups of objects (by groups of objects we mean an unordered collection of 
objects, in which the repetition of them is possible) of the set 22, possessing a power taking into 
account repetitions, eg and e;, respectively. We assume that the subscript runs through a set of 
natural numbers of power h, where the different subscripts may correspond to identical groups. 


Definition 1. The set It with the assigned aggregate of groups Ep, F),... is called a 
network and is denoted by M(Eo, E,,...), if |E,.I= U JE; |. Here the symbol |£| denotes 
i<i<h 
the set of all the objects of the group F. The objects occurring in the composition of the set 22, are 
called vertices of the network, and the objects of the group Fp are called poles of the network. 


In the study of real control systems a fundamental role is played by networks for which the 
numbers m, h, e; i=0,1,...,h, are natural numbers. Such networks are called finite. 


Let M(E., Ey,..., Ex) bea finite network and E:=(a;',..., @;,), where a= M, 
i=0, 1,..., 2. With each group £; we associate a circle in three-dimensional space, and with the 
objects a,',..., a} of the group £; we associate rays emerging from this circle. 
t 
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With the group Ep we associate in three-dimensional space points, each possessing one ray, 
each of which corresponds to one of the objects a,°, ... ,a/9. We assume that all the rays 
corresponding to the same objects of the set Dt, are connected to each other. The diagram obtained 
as a result of the construction (Fig. 1), is called the geometrical realization of the network, if: a) 
each pair of circles occurring in the diagram has no common points; b) the junctions of the rays 
corresponding to different vertices a; and a; have no common points. 


2. Memory, elements, elementary subschemes. 


Definition 2. The set %={X.} of distinct objects is called a memory; the objects Xq are 
called cells. 























FIG. 2. 


Semantically a memory is understood as a receptacle for the storage and memorization of 
information. 


Definition 3. The symbol Co( , , ), possessing three empty places and a certain number of 
poles, is called an element, if there is indicated: the number cg of poles which the symbol Cg has, 
and cardinal numbers ug, Vg, Wa, Corresponding to the first, second and third empty places. 


Here we must note that the numbers ug, Vg, Wa may be zeros. Below the elements will be 
linked, on the one hand with networks, on the other hand with the memory. Indeed, the poles of 
the element Cy will be put in a one-to-one relation with the objects of the group £;. The latter 
is possible only on condition that cg = e;. As for the empty places of the symbol C)( , , ), groups 
Xoo Yor Zq Of cells of some memory x will be substituted in them; and then the powers of the 
groups Xq, Yq, Zy will equal ug, Veg, We respectively. 


Let x be some memory, Eg a group of objects of the set M and Cy( , , ) an arbitrary 
element possessing Cy poles, with whose empty places are associated the numbers Ug, Vg, Wy. 


Definition 4. The symbol C,=C,(X*%, Y*, Z*) is called an elementary subscheme on the 





Local algorithms on Yablonskii schemes 


memory x, if with the poles of the element Co{ , , ) are associated respectively objects of the 
group Ey possessing the power Cg, and for the groups X%, Y%, Z® of the memory x, possessing 
respectively the power Ug, Vg, Wa; the condition 


(|X*]U|¥=|)n|Z*| =2. 
is satisfied. 


Here the group Z“ defines those cells of the memory x, which are rigidly connected with the 
given elementary subscheme, these cells containing both information necessary for the operation of 
the elementary subscheme, and also information arising as a result of its operation. The group 
X@ selects those cells of the memory x%\Z%, containing information necessary for the 
operation of the elementary subscheme Cg. Finally, the group Y® fixes those cells of the memory 

y\Z*, in which information arrives which has emerged thanks to the operation of the elementary 
subscheme. 


Definition 5. Let Cg be an elementary subscheme on the memory x. We will call the set Z™ 
and \Z*, respectively, the internal and external memory of the elementary subscheme Cg. 


FIG. 2. 


1 corresponds to aj; %, 2 corresponds to aj, 3 corresponds to a® 
Sq 


For clarity (Fig. 2) we will provisionally represent elementary subschemes as a circle from which 
emerge Cg numbered rays (objects of the group Fy), with the symbol Cp) at the centre. 


3. Schemes. 


Definition 6. Let og be a set of elementary subschemes on the memory x and M(E,, E.,...) 
be some network. The symbol M(Eo, Ca, Ca:,---) is called a scheme, if it is obtained as the 
result of substituting into the network 9R(#,, E,,...) ina place of the groups Fj, F>,... the 
elementary subschemes C,,= C,,%:(X%, Y%, Z%) Chm {AN FA 2s 5 where 
€i=Ca,, i=1, 2,..., and the poles of the elementary subscheme C,,, = Cri (XxX yo 7) 


are put in a definite correspondence with the vertices of the group £;,i=1,2,.... 


In particular, MW(E,,C.)=C., where Ep = Fy = Eg. 





A. V. Kabuloyv 


Definition 7. The sets Z=\|J)|Z*| and y\Z are called the internal and external memory 
of the scheme (Eo, Ca,, Ca:,...). , respectively. 


Schemes constructed on networks are conveniently represented geometrically. For this 
purpose it is sufficient in the geometrical image of the network to write the representation of the 
elementary subscheme Cg; in the circle representing the group £;. 


2. Majorant local algorithms on Yablonskii schemes 


1. Neighborhoods of the elements of control systems. Let a network M(Ey, Ey, ...) be given 
with a set of poles Ep and a set of groups £,, F>,. 


Definition 8. The principal neighborhood of the first order S,(Z;, 9) of the group £; of 
the network M(E£,, £,,...) is defined as all the groups Ez.cM|ELNE,AS. 


Let the principal neighborhood of the (k i" 1)-th order S,-,(E,. MN) of the group £; of the 
network M(/,, E;,...). have been defined. 


Definition 9. The principal neighborhood of the k-th order S,(£,, St) of the group £; 
of the network M(o, E£,,...) is defined as all the groups Ey of Mt, for which one of the 
following conditions is satisfied: 1) E.NEs*#¢, Es=S,-,(E.,M) 2) EeSU Ey, where Ey 

k 


satisfies condition 1). 


If a finite network is given, then we know that for it there exists its own geometrical 
realization. Therefore the principal neighborhood of the k-th order for a group of a network can be 
defined as the principal neighborhood of the k-th order for the vertices of the graph corresponding 


to the given network. 

















FIG. 3. 


Example. Let M=(41, 2, 3, 4, 5, 6, 7,8,9, 10,11) be the network M(E£,, E;.... 
, E40) , see Fig. 3. For it 
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bo= (4, 2, 7, 14) E.=(4, 5, 10) 

E,=(1, 2, 6, 10) E;=(6, 6, 10) E.=(8, 9, 10) 

E,=(2, 2, 3) E.= (6, 7, 7) E,=(5, 9, 10, 14) 
E,=(3, 4) 1;=(7, 8) Ey=(10, 10, 10, 10, 10) 


The principal neighborhoods: 
S,(E,, MN) =(E,, Eo, Es, Es, E,0), 
Si (Eyo, Dt) =(Eyo, E:, Es, Es, Es, Es), 
S2.(E,, M) =S2(EL,o, M) =M. 


Let M(Ey, Ca, Car...) be ascheme on the memory x, and let Og be a set of subschemes 
of the scheme 2. 


Definition 10. The principal neighborhood of the first order S,(C.;, Dt) of the subscheme 
Ca, of the scheme is defined as the aggregate of the subschemes C., of YW. such 
that Ci,NC., #S and (| ¥*%|N(|X%|U|Z%«]) =F) V (| V%*|N(|X*|U |Z“ |) #2). 


Let the principal neighborhood of the (kK — 1)-th order of the subscheme C,.; of the 
scheme have been defined. 


Definition 11. The principal neighborhood of the k-th order S,(C.,, 2) of the subscheme 
Ca; in ® is defined as the aggregate of all the subschemes C',,. of Mt, for which one of the 
following conditions is satisfied: 


Ca, 1] Ca, #2, where C. ]S,-1(Ca, ,M), 


(| ¥~| (|X |U|Z*|) #2) V ([¥*|N(|X*|U|Z"|) =2); 


Ca, =U Cs, 


((IX*IUIZ*))SUIYe) VY ISU UX*1UIZ™1)), 


where Cu; satisfies condition 1). 


In this definition of the principal neighborhood of the k-th order of a subscheme proximity 
in the network (in the topology) and in the memory is considered simultaneously. But this is not 
always the case. We consider particular cases. 


Proximity in the network. 


Definition 12. The principal neighborhood of the first order S,(C,,, I) of the subscheme 
C,; of the scheme ® is defined as the aggregate of subschemes (a, of ‘Dt, for which the 
following condition is satisfied: 
Cu Ca, #2. 
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Let the principal neighborhood of the (A — 1)-th order of the subscheme (,; of the scheme 
M have been defined. 


Definition 13. The principal neighborhood of the k-th order S,(C.,, 2) of the subscheme 
C.; of the scheme Mis defined as the aggregate of all the subschemes C., of 2, for which 


u 


one of the following conditions is satisfied: 


1) Ca, 1 Ca; F (71, where Ca; = Spa (Ca,, M); 


2) Ca, = UV Ca; where Ca, satisfies condition 1). 
j 
Proximity in the memory. 


Definition 14. The principal neighborhood of the first order S,(C.,, 2) of the subscheme 
Ca, of QM is defined as the aggregate of all the subschemes C,,, of ® for which the 


following condition is satisfied: 
(| Yee] (|X**]U] Z|) ADV ([Y**|N(|X*|U|Z* |) #2), 


Let the principal neighborhood of the (kK — 1)-th order of the subscheme C.; have been 
defined. 


Definition 15. The principal neighborhood of the k-th order S,(C.; , Dt) of the subscheme 
Ca; of M is defined as the aggregate of all the subschemes C,, of &, for which one of the 


fillowiag conditions is satisfied: 
1) (|¥%|N|X%«]U]Z%*« |) AS)V (| V%|N(|X%|U[Z%|) =), 
where C , ES,-1(Ca, , Mt); 
2) (UIX*IUIZ*I) =U ly“) V (y= (x"1Y1Z~1)), 


where Ca; satisfies condition 1). 


Below in the proofs of the theorems we will consider proximity simultaneously in the network 


and in the memory. 


Let a set of subschemes o={C,} of the scheme M, have been defined, and let {o} bea 
family of the set of subschemes. Let S$;(Cx, Dt) be the principal neighborhood of the i-th 
order of the subscheme Cy in the set o,< {6}. 


Theorem 1 
The neighborhoods S,(Ca, 6;), S2(Ca, 6;),...;Sx(@a, 6/) are special neighborhoods. 
The proof is by induction with respect to k. 


1. Let S,(Ca, 6;)SG;Nog, S;(Ca, Gp) =0;/NGy. The neighborhood S,(C.. 6.) is 
composed of all the subschemes C.; such that C,NC.,#S, 


(|¥#|n(|[X%|U]Z"]) #2) V([V|A(|X*|U|Z*]) #2). 


Allsuch C,. occurin S,(Cu, 6g), since S,(C., Gg)SG—. Therefore S,(C,, 6:)=S; 
(Cx. Gg). The converse inclusion is proved similarly. Consequently, 
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S, (C., 0;) =S, (Ci, Og) . 


2. Let Sr-1(Ca, 5;) =S-1(Ca, G9), Sr(Ca, 0;)S6/N09, Sx(Ca, Jp) So Ady. We show 
that S,(Ca, 6,)=S:(Ca, Op). Theset S,(Cz, o;) is composed of subschemes Ca; such 
that: 


l)either Cx; has anon-empty intersection with C.;, where C.,]S,-,(C.,0;)  , 
and 


(| ¥ | (|X*|U|Z™|) =) V (| Ye] (|X"]U|Z"]) =2,; 
since m-4tCa, G;) =S,,-1(Ca, Ge), - then Ca; @5,.,(C., Oe); 


2)or Ca; is contained in the sum of subschemes C,; suchthat C4; satisfies 


condition 1). 


But all the C., occurin $,(C., 69), and therefore C,, occursin §,(C,, og) and 
consequently S,(Ca, 6;) =Sx(Ca, de). 


The converse inclusion is proved similarly. The theorem is proved. 


2. The best local algorithm for elements of control systems. Let the memory x and the set of 
elementary subschemes a be fixed. We will consider the scheme 


xi mM , (Ee, Cs Cn a i .) ° 


a), be given, where a= {0, 1, 


let o={C,,,... Cy}. We call the set o°={C;,,",..., C,"}, where 


a x ° v4 
se | 3 "Seip ¢ 
-Y; 


" E x, ‘, 
(EN Y" 2%), wy, =f, 5, Ca spe) * 
permissible for o. 


We call the class M*=IJ(c) of all the sets o* permissible for o the information class of the 
set o with respect to certain predicates P;,..., /. 


The neighborhood $(C,,«) determines the neighborhood S(C,',o°), where 


o &—T/(o). 


Let the functions :,...,@, and the algorithm A, defined by the system of predicates 
P,,...,P,be given, and let the class of algorithms 7, with the same memory be given, where the 
domain of definition is the same for all the algorithms: 


N= {Az Pia. 
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Theorem 2 


For every class 7, of locally equal algorithms with the same memory there exists a majorant 
algorithm. 


Proof, Let I(c) be the information class of the set o, where o@&{o} . We consider the set 
M’=|J I(o). 


Let C,'<=o", where o°=M*, and S(C,', o) is the neighborhood of the subscheme 
C,* ino. 

We select in M* all the sets hy such that C,'@o." and S(C,', 0°) 7S 
denote the aggregate of all the sets 0g* by 


M,(C,'). 


The aggregate of sets o such that in (1) there exists a o* from J(0) will be denoted by M,(C,). 
We introduce the function @,’, i=1, 2,..., 1, as follows: 


4 


(Q;,...,0,), if a, {0, 4}; 
(Giggs Minka Ps Bidas's > og Bie. Te 
all o@M, (Cy) there is satisfied the 
{ relation P;(Cy,0)=y, pS (9, 1}; 
(Oy, sey Q-1, A, Qisty sees Qi), 

ifin M,(C,) Ho and 
D2 P; (Cy, 01) - Pi (Cy, 02). 


9,°(Cy, Oy1,--+,Q), S, 0") — 





We prove the monotonicity of the functions @,°,i=1,2,...,1. Let $,=S(C,', 01°) 
<S:=S(C,', 02°), then 


M,,(C,) >M,,(C,). (2) 
bv 8 
Indeed, if o=M.,(C,), then a o* exists such that o° © M,,(C, * 


I 
ix", F.2D. 


We replace in o* some markers Y, Y= {0, 1}, by Aso as to obtain the equation $(C,', G,") * 
S(C,‘, 6°); it is obvious that o°]=/J(o) and 6*°=M,,(C,') ,butthen o&M,,(C,) .The 
inclusion (2) is proved. 


bite (Cates 5.2 5 tea, A) Oi «55 ee SE) ee, 0 a, WH as ek 
a), y={0, 1}. It follows from the definition of y,° that the equation P,(C,, o)=1. is 
satisfied for all o from ™.,(C,) . But then from (2) we obtain that the equation P;(C,, 0’) =y. 
is also satisfied for all o' from M,,(C,) . Therefore 


9° (C,, Bi, ore Bi, S, 02") = (fi, 0 Wey Bi-s, Y; Bi+s, eee Bi). 


2. Let @iP(Cy, Bi,..-, Br, S, 02°) =(Bi,---,Bi-s, A, Bissy..., Br). By the definition 
of y;9,in M.,(C,) there exist sets o,, 6. suchthat P:(C,, 6,)~Pi(C,, 62) . But (2) 
implies that 61, G2 are containedin M,,(C,). Therefore, p;°(C,, a,...,@i-1, A, Gist,.--, 
Qi, S, 0,°) = (a, » hipigiteraas A, oP See a). 
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Paragraphs | and 2 imply the monotonicity of @,°, i= 


We show that A={A,”, @,°,...,q@°, S} isa majorant algorithm. Let Bea.. B= 
{A,', Mi,---;Q1, S}, andleta=A, :(C,, a,...,a, S, o°)= 
Biss. oy Ge), yetO, 4). Then @(C,, a,..., ay S, o*) =(a1,..., 
..,@1) .Weassume that By. If Be{0, 1} , then 8=y, since otherwise either yg; or vy; 
transforms the information set o* into a set which is not an information set. 


Let 6 = A. Then it follows from the definition of y,° that the set M,(C,) contains 0, and 04 
such that P;(C,, 6,;) #Pi(C,, 62). 


We isolate in M,(C,!) elements 0, * and 02* such that 6,°=/(0,), o=/(62).  Itis 
obvious that S(C,', o,°) * S(C,', 62") .Therefore gi(Cy, a,....0. S. 6,7) = Pil, &, 
“9 Q1, S, 02") =: (C,, OX, eeey 1, S. gs"). 


From the elements 0, *, 0>* we select that for which y*P;(C,, 6,). 0,5] {6,", 2}. Let 
such an element be o, *. Then the application of y; and the replacement in o, * of the element Cy! 


oat eon Aj_y, 4, Ai+y, sane Sed 


E Phe . 
by the element (C})’ = Cy (X", Y", Z°) takes o, * out of (o,), therefore 


t 


y = A. The theorem is proved. 


Let {0} bea family of sets of subschemes. We suppose that for all (C,, 0), Cv=9: 


o€={o} distinct neighborhoods S'(C,, o), S°(C;, 6), have been introduced, where 
S'(C,, 6) ={a}, S°(C,, 6) ={o} . Let A, and A, be algorithms which are majorant for 7,, 
and ., respectively. 


Theorem 3. 
let S'(C,, o)SS°(C,, 6) , then AySAz. 
Proof. The theorem will be proved if the validity of the following proposition is established. 


Let A,SA (Qi, .-- Pay, gs S), A:SA (Qa, .. +, Paty Py. Hy S), 
Q2i (Cy, 1, 00 9 Hi-1, A, Hittg see, Hi, S’, 0°) =(a, oo ey Hint, A, Hitt, eeey Gi). Then 
Qui (Cy, Qi, e-+, Hi-s, A, Kitty se +y Hi, S*, O°) =(ay,... » Mi-1, A, QHitiyees , G1). 


By the definition of @2i, inthe set M,*(C,) there exist elements 0, and 0, such that 
P;(Cy, 0:)=0, Pi(C,, o2)=1. We show that o,<M,'(C,) and o,=M,'(C,), that is, 
S*(C,, 01) =S? (Cy, 62). Let o°E1 (a). We have S*(Cy, 6) S01, S?(C,, 6)So2, S*(C, 
0;) =S?(C,, 6) =S?(C,, a2), S*(Cy, 6:1) SS°(C,, 6,), S*(C,, 02) SS?(Cy, o2). Therefore 
S'(C,, 6:1) SS?(C,, o)So0,, S*(Cy, 62) SS? (C,, 6) So2. 


From part 3 of the definition of neighborhood (see [1] ) it follows that S*(C,, o,) = S'*(C,, 8, 
(Cy, )), S*(Cy, G2) =S* (Cy, S°(Cy, 0)). 


By hypothesis of the theorem S*(C,, 6)<=(o, S'(C,, S*(C,, 6)) is defined, therefore 
S*(C,, 6) =S*(C,, 62). 


In M,((C,*)’) there exist sets 0, * and 0* such that o,°€J(0,), 62°=J (02), that is 
S*((Cy)’, 0,°)AS*((C,')’, 02"). Therefore S*((C,4)’, 04°) A S*((Cy')’, 2"). The last 
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equation and the condition P:(Cy, 6:)#P:(C,, 62) imply that @:'(Cy, a1,...,Qi-1, A, 
Qitiy e+e QM, of o*) = (G4, 2 oie Qhi-ty A, ia45 020s 1). 


The theorem is proved. 


Translated by J. Berry 
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AN APPROACH TO THE CONSTRUCTION OF OPTIMAL RECOGNITION 
ALGORITHMS FOR LARGE CONTROL TABLES* 


A. G. TSERKOVNYI 
Moscow 


(Received 4 May 1975) 


A RECOGNITION algorithm from the class of estimate calculation algorithms is described. 
Approximate formulas for calculating the parameter values of the algorithm are presented. A scheme 
of a method of optimizing this algorithm for working with large training and control tables is 
presented. 


1. Introduction 


The algorithm described belongs to the class of algorithms for the calculation of estimates 
(voting algorithms) [1]. Its principal features may be regarded as: orientation to the solution of 
comparatively large-scale problems; the possibility of using features from different alphabets in the 
initial descriptions; partial, as distinct from standard voting algorithms, and optimization carried 
out with respect to the parameters of the algorithm. As a rule, in known algorithms for the 
calculation of estimates optimization of the functional of quality of recognition is performed on the 
parameter space of the algorithm. In this case because of the large dimension of the problem it 
is impossible to use this approach. Therefore the following process of optimization of the algorithm 
is proposed. First the approximate values of the parameters of the algorithm are calculated and 
recognition is performed with these values. At the next stage, using the results of the approximate 
recognition, the parameters of the algorithm are varied by some amount and a study is made of the 
effect of the variability of specific parameters on the recognition quality. Then a group of parameters 
is selected whose variation most substantially affects the improvement of the quality recognition. 
Then the algorithm is optimized on only the selected subspace of parameters. 


2. Description of the class of algorithms 


As usual there are two sets of objects, combined in the tables te and 7 Shs, : 
called the standard and the control table respectively. The subscripts in the notation Ty, m,, 
have the following meaning: n is the number of features specifying the description of the object to 
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be recognized, m is the number of objects in the corresponding table, / is the number of classes 
to which the objects in the table are assigned. 


The recognition algorithm is specified by four groups of parameters: 


1) the parameters pj, ..., P,, — the information weights of the features (columns of the 
table 7), mp); 


2) the parameters 7), .... Yj, — the information weights of the objects (rows of the table 
T, 


n,m,D? 


(The information weight has the meaning defined in [2] .) 


3) the parameters €,,--+-+,€, — the thresholds of accuracy in the comparison of the values 
of the features; 


4) the parameter \ — the threshold value of the decision rule. 


We assume that we are given the values of all the parameters of the algorithm: pi, ..., Dn, 
Yin +++) Ym, €1,..+,8n, A. We describe the procedure of recognition of some object S (row 
S of 7? we ), that is, its assignment to one of the classes {Ky, Ki,..., Ki} (if the object is 
assigned to the class Kg, then this means that the algorithm fails to recognize the given object). The 
object S of the control set is assigned to one of the classes by sequential comparison with all the 


objects of the standard set and by calculation of a definite family of estimates. 


Therefore, let us compare an object S of the control set and an object S, of the standard set. 
The descriptions of these objects are specified by the following values of the features:S={a,,..., &n} 
and S:={B,,..., Bn}. 


We will consider that the values of the corresponding features are identical if 


0 (ai, Bi) Sei, 


where p (a, 8) is a numerical function satisfying the conditions p(a, 8) =e(B, a), p(a, a) =0, 
p(a,B)>0, if a*B. 


We assume that for the specified objects (the rows S and S,) the features with numbers 
ix, ..., dy. were identical (in the sense defined above). The proximity function of these rows can 
be determined in the form r(S, S;) =n—v. 


We then construct estimates similar in meaning to the estimates introduced in the standard 
voting procedures [1]. 


1. The row by row estimate: 
P(S,S:)=y.(pit... tps), 


where 7; is the information weight of the row S,,and pi,,..., Piy are the information weights 
of the features whose values are identical in the rows S and Sy. 


2. The estimate of the row S for the class K;: 
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4 
r,($)=———- J" T(S,5,), j=4,2,... 1, 
mMj—™Mj-1 
S;eKy 
where m;—™m;_, is the number of rows in the class K;. 


3. The decision rule. After the estimates [',(S),...,T,(S) of the row S have been 
calculated for each of the classes, we use the decision F given in the following form: 


uw, if TYS)—T By), $4 255%. 


0 otherwise 


F(T, (S),..., 1; (S)]= | 


Therefore, we assign the row S of the control table to one of the classes {Ko,..., Ki}. 
Calculating the corresponding estimates and applying the decision rule successively for each of the 
objects of the control table, we perform the recognition of all the objects of the table. To estimate 
the recognition performed it is necessary to introduce a functional of quality of recognition with 
given parameter values, which can be written in the following form: 


go=m'‘/m’, 
where m* is the number of correctly recognized objects, and m’ is the total number of objects in 
the control table. 


By varying the specific values of the parameters of the algorithm we obtain different values 
of the quality functional y. Below we describe the calculation of the approximate values of the 
parameters of the algorithm, with which the first stage of the recognition procedure is performed. 


3. Approximate formulas for selecting the parameters of the algorithm 


1. The calculation of the parameters €,. We first note that as @(a,B) for comparing the 
values of the features we can take the simple function p(a,8)=|a—f]|, but in this case the 
scale of values of the features may require recoding. 


Therefore, we consider a procedure for calculating the accuracy threshold ¢; for the feature 
number i. Let the feature i be able to assume one of the values {d,,..., @,}. We extract from 
the tables T “ene and T les only the i-th columns, that is, only the values assumed by the 
i-th features in the objects described. We will consider that we are given the tables ae and ii 
in which the descriptions of the objects are determined by only the one feature i, and we carry out 
the procedure of recognition of one column from another with different values of the parameter €;. 
It is obvious that as possible values of €; it is advisable to take only ¢;=d),..., d,, | where 
dy = 0. As a result of the calculations performed we obtain a matrix A=|la;;||(s+1)xs, where the 
rows correspond to different values of €;, and the columns to different possible values of the feature 
i, which are recognized with the given ¢;. The element a;; of the matrix A represents the number of 
correctly recognized values d; of the feature for some e= e;. It should be noted that when the 
conditions 


dj<d,+¢i, dj>d,—&; 


are satisfied, the element a;; = 0, since the value of €; is so great that it spans the whole possible scale 
of values for the feature i, and in this case correct recognition for d; is impossible. Therefore, for the 
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calculation of the best value of ¢; it is necessary to consider not all the s X (s + 1) elements of the 
matrix A, but rather fewer. 


Summing the values of the elements of the matrix A along the rows, we obtain the total 
number of correctly recognized possible values of the given feature. As an approximate €; we 
choose that which corresponds to the maximum number of correctly recognized values of the feature 
i. If there are several such €;, then we choose the greatest of the €;, since in this case for the same 
quality of recognition we specify less strict conditions on the coincidence of various possible values 
of the feature i. 


2. Calculation of the parameters p;. After determining the best €; we find p; — the information 
weight of the feature i, also using the matrix A, by the following formula: 


pi=qi/m’ log: s, 


where q; is the maximum number of correctly recognized values of the feature i, otherwise, the sum 
of the row of the matrix A corresponding to the best €;;s is the number of possible values of the 
scale of the feature i;m’ is the number of objects in the control table. (In this formula log, is used, 
since we consider the information weight on | information bit.) 


3. Calculation of the parameters 7;. Having the values of the parameters €; and p;, we put 
¥=1, j=1,2,...,m, and perform the recognition procedure with these parameters. Then the 


7; can be calculated by the following formula: 


SS ug » P(S;, S;’) | Xr, Si’); 


i¢Ky ieKy 


here the row S; belongs to the class K;. This definition of y; treats the information weight of the 
object S; as the ratio of the estimate of the “proximity” of the object to its class to the estimate of 
the “proximity” to all the remaining classes. 


4. Calculation of the parameter A. We perform the recognition procedure with the values 
of €;, p; and 7; previously determined. We specify the decision rule as follows: 


u, if Ty(S)—Tj(S)>0, uj, j=1, 2,..., 2, 


OQ otherwise. 


e [Ts ( rer P(S)]= 


We will simultaneously calculate the quantity 
A=, (S,) — max I’j(S;). 
j#u 
As the best value of \ we choose A=minA,, where r runs through only the correctly recognized 


r 


objects. 


Having determined by the procedure described approximate values of the parameters of the 
algorithm P1,..- 5 Dny Yis-++9 \my Si +++» Sig, A, We construct the recognition of the table 
T? w.  bythetable 7} ,,, with these values of the parameters. To this recognition there 
corresponds a definite value of the quality functional. The next problem will consist of such a 
variation of the parameters as will lead to an improvement of the quality of recognition. 
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4. Necessary conditions for the values of the variation of parameters 


Because of the great dimension of the problem it is difficult to find an algorithm extremal 
with respect to all the parameters. 


Therefore a method is proposed enabling us to find a subset of parameters, whose variation 
has the most substantial effect on the improvement of the quality of recognition. The idea is as 
follows: varying the parameters subject to the condition that this variation of them does not worsen 
the original recognition, we ascertain which the parameters are whose variation most substantially 
affects the improvement of the recognition of the objects at the first stage. 


Therefore, we take one of the classes of objects, let us say the class K,. The objects of this 
class (S) after the first stage are divided into two subsets: correctly recognized, that is, those for 


which 


r,(S) — max I';(S) 2A, 


j#t 
and incorrectly recognized, which include also objects which the algorithm has refused to recognize. 


We vary the values of the information weights p; by some quantity Ap; and that of the 
information weights of the objects 7; by AY; 


Therefore, the previously introduced value [(S, S;) of the estimate of the row S for the 
row S; changes its value to I’ (S, S;): 


I’ (S, S;)= (y+Ay;) (pie tApiet ... +i tAp:,’) 
=;(pio +... +p’) tAys (Diet... Hpi’) Hyj(Apiit ... FApy, ) 
+Ay;(Api’+... tAp;, 4) =P (S, S;) FAy;(pi’t ... tpi) 


+y;(Api’t+...+Ap; 7) +0(A), 


where ¥; is the information weight of the row S;, pi’,..., P}. are the information weights 
of the features for which the rows S and S; agree. 


Neglecting the last term in (1), we obtain a transformation 7,(S) for estimating the row S 
after the row S;: 


n;(S) =I" (S, S;)—T(S, S;) =Ay;(pi’ +... +ps,’) 


+7;(Apiz+ ves +Ap; RS . 


In the class K, let the objects 1,2,..., 7 have been recognized correctly, and the objects 
r+1,...,m, incorrectly. We have varied the values of the parameters p; and 7;. It is obviously 
necessary that the correctly recognized objects with the numbers 1, 2,..., 7 should be correctly 
recognized with the new parameter values also. Therefore, a condition of the form 


max (Ij, r;) —min(Iy, I’) SA, 
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must be satisfied, where j runs through the correctly recognized rows. Using this necessary 
condition, we construct a system of inequalities for the increments of the estimates for the class 


K, for the correctly recognized rows: 


1 
ea 13(S:) <A, 


or in expanded form 


1 . j j : 
=p ¥ [Ays(pa't... tpi, )FYs(Apa't.. +Aps,)]<A, 


i= . 
est aed d j 
= Y [Ays(piot... tpi )+rs(Apit... +Aps, )1<A, 


where m, is the number of objects in the class K;. 


We have obtained a system of linear inequalities in Ap; and Ay;. After solving system (2), 
we obtain some constraints imposed on the Ap;, Ay;. These constraints specify the domain of 
possible variations of the parameters p; and y;. It should be noted that on solving a system of the 
type (2) for only one class K,, we obtain constraints possibly not on all the parameters p;, but only 
on the parameters 7 relating to objects occurring in the class K,. By constructing systems of linear 
inequalities (2) for the sets of correctly recognized objects of all the classes K,,..., K, and uniting 
the resulting domains of variation of the parameters, we find some common domain of possible 
variations of the parameters of the algorithm p; and ;, characterized by the fact that the variation 
of the parameters in this domain, in any case, does not worsen the already attained recognition 
quality. The further aim is that the variation of the parameters in this domain should improve the 
recognition of the objects which at the first stage were not referred to their proper class. 


We write down formally the conditions necessary for the correct recognition of each of the 
objects not previously recognized. These conditions consist of the satisfaction of the decision rule 
for each of the objects with the new values of the parameters of the algorithm (that is, the estimate 
for its own class must be greater than the estimates for any other class taking into account the 
threshold value A). Returning once more to the class K,, we construct a system of inequalities (3) 
for the first of the incorrectly recognized objects, whose number is r + 1: 


1 1 
tH (Se4s) = Fa ba Sess) >I",-I;+A, 


jEk; jek, 


1 ; 4 
Fe Ny tH(Se1) —— J) 5 (Sra) >P,-I4A, 


jek; jEK, 
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or in expanded form 


mi 


{ | : | 
—) [Ay (pitt... tpi...) +1 (Apet...+Aps.,)] 
mM, & . 


1 ) , : 
—— Mayle... tpi ty (Apet...+Ape,,,)]>Te Teed, 
Mg, a 


g==i,2,...,2 


where m; is the number of objects in the class K;, ¥ are the row estimates obtained for the 


corresponding class K jat the first stage of recognition. 


Similar systems must be constructed for each of the previously incorrectly recognized objects. 
We check each of the systems of linear inequalities of the form (4) for consistency.:Objects to 
which correspond consistent systems of inequalities constitute a set of rows for which correct 
recognition is “potentially” possible. We solve each of the consistent systems (4) with constraints 
on the unknowns Ap; and Ay; obtained from the solutions of the systems of inequalities (2). In each 
case, if for some value of Ap; or Ay; there is a non-zero solution satisfying the constraints, the 
corresponding parameter p; Or y; receives some mark. 


As a result to each of the parameters of the recognition algorithm p;, y; there corresponds a 
certain number of marks, which indicates how many times a non-zero variation of this parameter 
has given an improved recognition quality. Therefore, in the conditions of the multidimensional 
problem, we must recognize that the variation of parameters most efficient from the point of view 
of the improvement of recognition, is that which has obtained the greatest number of marks. 


Translated by J. Berry. 
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THE CONVERGENCE OF MONOTONIC ITERATIVE PROCESSES* 
A. Yu. OSTROVSKII 
Moscow 


(Received 7 July 1975; revised 3 May 1976) 


ESTIMATES are given of the convergence of monotonic iterative processes intended for solving 
problems of the form z=A,z—A2x+f subject to the conditions A,>0, A2>0, ||A,+Aall <1. 


Many technical problems reduce to solving systems of linear algebraic equations of the form 
s==AgT I. (1) 


where A is a matrix, all of whose coefficients aj; are strictly positive,and ||Alli<1. Such systems 
are often solved by using the classical process of successive approximation 


Lpy j= Ar,tf. 


Here if the norm of the matrix A is close to unity, as is often the case in applied problems, then 
this process converges so slowly that it cannot be regarded as really suitable for calculations. 
However, if the spread of the coefficients of the matrix is comparatively small, then there exists 
for the solution of problem (1) an efficient method of accelerating the convergence, proposed in 
[1] (see also [2] ). 


In this paper the rate of convergence of this method is estimated. The estimates are given in 
terms of the so-called @-norm of the matrix A, introduced and investigated in [3]. 


1. We consider Eq. (1), where A is a positive matrix with norm less than unity. The iterative 
process proposed in [1] is constructed as follows. Initial approximations ug and vo satisfying the 
relations 

Up SV, Up SAuot+f, Avotf<v9 


are chosen (here and below the notation u<v_ means that the components of the vector v—u are 
non-negative). A number of simple methods exist for finding such ug, vg. In particular, if 


n 


JAll= max Sais, 


we can take 


Uyp=—SZ, Vo=!z, 
where 


*Zh. vychisl. Mat, mat. Fiz., 17, 1, 233-238, 1977. 
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n n it 


a[e(Qort) mele (-Be) 7} 


jms jm 
The successive approximations u;,, vy, and u,*,v,* are found by the following formulas: 


Uo’ =U, Vo’ = V9, 
Un+i=Aug* +f: Vey i=Av,'* +f, 


e 4 . 
Und = (Un¢it+Pre1Vrnzs), Vio = (Vn4itQn+iBrts), 
14+DPrit 1+qQn+4 


k=1,2,.. 
Here the non-negative parameters pz4 1,441 are so chosen that at each step the relations 
Up* SUp+1, Vai SUR’. (3) 


are satisfied. As is shown in [1], the resulting successive approximations converge to the solution x 
of problem (1). The process (2) is monotonic: 


u,-1<ua,< u,*<2r< va <v,<V;_4. (4) 
Since the cone of non-negative vectors in R” is normal [2] , (4) implies the inequalities 


|u,°—z|| <cllu,"—v, "ll, |z—v,"|| <cllu,*—v,° ll, 


where c is the constant of semimonotonicity of the norm. In particular, if ||z]]l—=max|z;|, then 

c = 1. Therefore the rate of convergence of the iterative process investigated can be* 

characterized by the rate of decrease of the quantity ||u,’—v,°ll. It will be shown below that for 
a judicious choice of the parameters p;, qx the value of |lux°—v,"ll decreases at the rate of a 
geometric progression. 


2. Let x, y be vectors with non-negative components. We define the numbers a(z, y), B(z, y) 
by the equations 


a(z, y)=max{a:a20, r>ay}, B(z, y)=max{B : B>0, y>Bz} 


and put 
O(z, y)=(a(z, y)B(z, y))-*. 


We note that if x, y are vectors with positive components, then 


a(z, y) = min (z;/y;) > 0, B(z, y) = min (y;/z;) > 0 
i 4 


and therefore 6(z, y)<~. A linear positive operator is called [3] a focussing operator, if 
6(Az,Ay)<C*, C>0, 220, y>0, Az, Ay*0. (5) 


The least of the numbers C for which (5) is satisfied, is called the 9-norm of the operator A and is 
denoted by 6(A). 
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Every matrix with positive coefficients is a focussing operator with respect to a cone of 
non-negative vectors in R”, and 


Qisdje \ ‘2 
0(4)= ( max ) } 


izej, spt AjsQit 


Theorem 1 


In the iterative process (2) let the parameters pz, gx be defined by the relations 


Pr = max{p:p>0, up—Up_-1>p (Va-1—Vx) }, 


{oad max {q:q>0, Va—1— Ua =q (Ua—Un-1)}. 


Then the following inequality holds: 


6(A)—1 
—_—_—_ |All ]un*—v," |]. (7) 
@(A) +1 


s oe 
Urge —Brzill< 


Proof. lf wpa*=upss (Ux°=Va41) for some k=k,then the vector ay* (v_") is the solution 
of problem (1), and the process is complete. Therefore we have to consider that for k<k 


Up” FUp+s, Vat Up’. (8) 


Formulas (2) imply the equations 


Vati — Urnti = (Ava? +f+n4:AUn’ + Qr4if) 
+qk+i 


4 
- (Aug’+f+Pr41A vn’ +Priif) 
1+DPr+1 


4 4 Pr+i Qr+1 ? 
=A ( U;,° — mt) ~ 4 ( v;,° — in’) , 
1+9n41 1+Dr+i : 1+ Pras 14+9r+1 


* a 1—Prtiqn+s 


Vr+i — Unqi = 
(1+ Pass) (1+ 941) 





(V_°—U;”) ‘ 


Consequently, 


* ° 1—p, 1Qk 1 
res—Urgsll< — Allllunt— apf. (9) 
(1+Przs) (1+ ni) 





We find an upper bound of the numerical factor on the right side of the inequality (9). By the same 


formulas (2), 
Urn+i — u,? = Au,’ + f a U,* 


1 1 7 * 
= A (y+ Prvr) + f —- —— [(Atg_1t+f) + Pr(Avr—itf) I, 
1+Dp 1+pDr 


whence 


Une — Ur = A[ (u,—un—1) — Pa(Ve-1—U») J. 
+Pr 


Similarly, 
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De® — Vag, = ——A[ (Va—-1— Vn) — Fn (Un—Up-1) J. 
+Qr 


[ (w»—my-1) - Px(De-1—Vn) 1, 
+ Dr 


hh = [ (Ur-1—Vn) — nx (Bp—Up-1) J. 
1+ qn 


Because of the choice of pz, qx the vectors g,, hy, are non-negative, and the vectors wa+1—w,°=A gp, 
Va°—UVn4i=Ah,, are non-zero by (8). Therefore, by (5) 


8 (tn41—Up*, Ue" —UVn41) <07(A). 


The equation 


(Prt19nts) ~*=O(Un41—Ux*, Vx°—Vavss), 
holds by (6), and this implies the relation 
(Pr+i9n+1) ~*S0(A). 
We put (PaiiQnts1)-t=v% Then v2>1, and therefore 


1—Dr4sQats . 4—1/y? 
(4+prss) (1+9n44) 1+41/y?+2/y (11) 








From (10), (11) and the monotonicity of the function (x—1)/(x+1) there follows the inequality 





| 1—ProiQrts 0 (A) -1 
ee a 


(1+Pr+i) (1+9n41) ips 0(A) +1" 
which implies (7). The theorem is proved. 


3. Theorem | shows that the process (2) is especially efficient in those cases where the norm 
of the matrix A is close to unity, and the spread of its coefficients is not very great. However, its 
use is also useful whenever the norm of the matrix A is small. The realization of the process (2) on 
a computer requires twice the number of operations of the usual method of successive approximation. 
But the process (2) converges like a geometrical progression with ratio g=(0(A)—1) (8(A) +4) ‘All. 
Therefore it can be regarded as more efficient than the usual method of successive approximation if 


that is, 
6(A) S(4+IAll) (4—IJAll) -!. 


The satisfaction of the last inequality is easy to check for specific systems. For the estimation of 


6(A) is is convenient to use one of the four relations: 
0 (A) < max yi, 0 (A) < max (yiy;)", 
i ij 


6(A) < max vj, 0 (A) < max (viv;)'", 
i isi 
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Qis asj we 
Yi = max—, vj = max —, t, j=, 2, ..., 2. 
s,t Git s,t tj 


4. An iterative process similar to the one described can be used to solve the problem 


z=Azr—Brtf, 


where A, B are positive matrices satisfying the condition ||A+B\|<1. In this case the initial 
approximations up, Vg are so chosen that the relations 


Up Vp, UySAuy—Bvotf, Avyp—Buyotf<vo. 


are satisfied. The successive approximations u;, vy, and u,*, v,* are sought by the formulas 


Uo* =U, Vo" =U, 
Upp 1=Au,*—Bo;,* +f, Va+1=Av_*—Buyz’ tf, 
. 1 i - q (13) 
Ur+i = (UngittesiVasi), Vroi = (Vegi ttesiUrzs), 
1+trst 1+tris 


ee 


where the parameter t,>0 is so chosen that relations (3) are satisfied. As above, the successive 
approximations converge monotonically to the solution x of problem (12) [1]. It will be shown 
below that if the choice of t, is subject to some natural condition, then the quantity |\u,’—v,"ll 
decreases at the rate of a geometrical progression. 


Let x, y be vectors with non-negative components. We determine the number 7(x, y) by the 
equation 


T(z, y)=max{t:t>0, z>ty, yStz}. 


Obviously, 


a 


T (zx, y)= min[min(z;/y:), min(yi/zi)]< 4, 


and if x, y are vectors with positive components, then T(z, y)>0. 


Lemma. Let A, B be positive matrices, and let z= (x,...,2n), y=(Yt,--+-»Yn) be arbitrary 
non-negative vectors, at least one of which is non-zero. Then 


T(Azt+By, Ay+Bz) =d>0, 


d = min[min(a;;/bi;), min(bi;/a4;) J. 
44 tJ 


Proof. We put 


Ci= (ai, seey Gin, bis, eeey bin), di= (bis, eoey Din, Git, woes Gin), i=1,2, soe 
z=(x%, eeeg Tny Yty oeey Yn). 
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3 i (ci, z) f (di, z) 
T(Azr+By, Ay+Bzr)= min [ min > min 
¢ (d3,2z) ¢ (c4,2) 


: i (ci, z) r m (di, z) 
=> min [ min min , minmin r 
é zpo (di, z) i zo (Ci, 2) 


It is convenient to denote the components of the vectorsc;, d; by es, dis j=1, 2,..., 2n. 


respectively. Obviously, 
min|[ (ci, z)/(di, z) ]= min{A:A4=(ci,z), 220, (ds, z)=4}. 
z0 


Therefore 
min[ (ci, z)/(di, z) ]= min (¢;;/d;;), j=1, 2, sees 2n. 
z>0 j 

Similarly, 


min[ (dj, z)/(ci, z) ]= min(d;;/c:;), j=1,2,...,2n. 
z>0 j 


The last two equations imply the validity of (15). The lemma is proved. 


Theorem 2 


In the iterative process (13) let the parameters ft, be defined by the equation 
th = max {t:t>0, hs ite, Pts (aga), P54 Na is edd, (17) 
Then the inequality 


sel ab 
- < 
ee aa 


{ 
; |A+B\\llv,°—ua'll, 


holds, where 
S(A, B) = max[max(a;3/bi;), max(bi;/a;;) J. 
i,j ij 


Proof. Reasoning as in the proof of Theorem 1, we will assume that for k<k 
Up FUR+1, Vpti UR”. 


Formulas (13) imply the equations 


Vp+1—Unr1= (AtB) (v,°—u;"), 


* * 1—thos 
Open hy = (Ve41—Un41); 
I+tpos 


° * 1—th4i 
Vari — Unti = (A+B) (vp*—ur*). 
1+tro4 


Consequently, . 
e . —ths 
lecitcte | HA+Billloxt—ay'll 

tty 
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Using formulas (13) we write the equations 


Un+1—Upn"=Ag,t+Bhy, Vp °—Vp41=Ah,t+Bgy, 


g.= [ (@x—ita—s) — tk(vn-1- 1») J, 
1+t, 


hh, = [ (vr-1—Vx) — itiesei) 1. 
1+t, 


Because of the choice of t, the vectors g,, hy, are non-negative, and by (19), wx4:—un"*0, v3" 
— vx41%0. Therefore, the lemma implies the estimate 


T (Up41—U;”, Vp’ —Vp41) Sd, (21) 


where d is defined by formula (16). In accordance with (17) we have 
troi=T (Un+i—Un*, Va*—Vn4i), 


therefore (20). (21) imply the estimate 


. e 1—d 
Ung —Unssll < —— ||4+Bllllv_°—a," |]. 
1+d 


This implies (18), since S(A, B) =1/d. 


Translated by J. Berry. 
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A METHOD OF REGULARIZING THE INVERSE RADON TRANSFORMATION 
IN A MEDICO-BIOLOGICAL PROBLEM* 


N. P. LIPATOV 
Moscow 
(Received 19 June 1975; revised 2 February 1976) 


A REGULARIZING algorithm is constructed for the problem of reconstructing a function from 
its radon transformation. The two-dimensional case is considered. 


Recently in a number of fields of technology and medicine the problem of the 
reconstruction of a section of a body has become urgent. By the reconstruction of a section we 
mean the determination of the coefficient of linear absorption of radiation f(x, y) at an arbitrary 
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point of the section. One possible way to solve this problem is as follows. A number of 
transilluminations of the body investigated are made with a narrow beam of radiation in the 
plane of the section; by processing the resulting data it is possible to obtain the value of 
f(x, y) at an arbitrary point. 


When a narrow beam of radiation passes through the body investigated its intensity on 
emergence is defined as follows: 


1(a,p)=Toexp | ~ f tener]. 
(a,p) 
Here a and p are the parameters of the straight line along which the transillumination occurs, 
and Jp is the intensity of the incident radiation. The symbol (a, p) under the integral sign shows 
that the integration takes place along the straight line with parameters a@ and p. 


It is usually assumed that p cannot assume negative values. But we will permit negative 
values of p. It is obvious that the straight lines with parameters a + 7, p and a, —p coincide. 


We denote the integral in the exponent exp by F(a, p). The expression F(a, p) is called 
the Radon transform of the function f(x, y): 


I (a, p) 


ty) 


F(a, p)= | i(z,y)dl= —In 


(a,p) 


We denote the operator of the transition from f(x, y) to its Radon transform by P. 


Since /(a, p) is known from measurement, and J is given, the problem reduces to the 
following: to reconstruct a function from its Radon transform. 


Since F(atn, p)=F(a, —p) and I(atn, p)=I(a, —p), the measurements of (a, p) 
must be made for 0<a<x. 


The body investigated has finite dimensions, hence it is natural to assume that f(x, y) 
vanishes outside a circle of radius R with centre at the origin of coordinates. 


It is known that if f(x, y) is continuous, then it is uniquely reconstructed by F(a, p). 
We define Il f ll and Il F Il as follows: 


lfl=sup|f(z, y)|, UFll=sup|¥(a, p) |. 


It is obvious that in the case of such norms the problem of the reconstruction of f from F is 
ill-posed, therefore it is advisable to construct a regularizing algorithm. 


Using geometrical considerations we can deduce the following formula: 


2m 6 


ffre y)dz dy = f F(a, p)dp — Pie (fF s)da ) ds, 
3 0 


Ko,0,6 — 2 
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where Ko o,g denotes the interior of a circle of radius 5 with centre at the origin of coordinates. 


If we denote by Xz,y,9 the interior of a circle of radius 5 with centre at the point x, y, 
and by R[F, 8] the mean value of f(x, y) in the circle K,,,,», then from (1) we obtain 


4 
ied fF pap 


i 


2x 
s 
fe gfeag, sorraese [ [Fe stzcosaty sina)da | ds }. 
- (s?—u?)'s 
5 0 


In order to verify that R[F, 6] is a regularizing operator, we can use the following theorem 
(see [1]). 


Theorem 


Let A be an operator from V into U, and R[u, 6] an operator from U into V, defined 
for any element of U and any 6>0, continuous with respect to u. If for any element zqV 


lim R[ Az, 6]= z, 


5-0 


then the operator R[u, 6] is a regularizing operator for the equation Az = u. 


In our case V is the set of functions continuous in the circle Kg 9 p and vanishing outside 
this circle, U is a set of continuous functions, where F(a+n,p)=F(a,-—p) and F(a, p)=0 
for p>R, andA=P. 


We impose another constraint on the system of functions V: for any f(x, 1’) belonging to 
V let the condition 


If (x1, ys) Ff (22, yo) | <D( (21-22)? + (ys—y2)?)"*, (3) 


be satisfied, where D is a constant independent of f. 


We estimate the accuracy of the specification of the original information (relative to the 
accuracy of (a, p) and Jj), which will be sufficient to reconstruct f(x, y) with accuracy e€. For 


this we prove two propositions. 


Proposition 1 


If f(x, v) satisfies the requirement (3) and r<e/2D, then the following inequality is 


satisfied: 


4 & 
Haw) -— ff tear ay | to, 


Kx.y.z 


The proof is obvious. 
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Proposition 2 


Let F(a, p) and G(a, p) belong to U, f(x, y) belong to V and F(a, p) be the Radon 
transform of f(z, y), H(a, p)=F (a, p)-G(@, p), |H(a,p)|<6; then the following inequality 


will be satisfied: 


4p 


at 


| ff f(a, y)dx dy — R[G,6] | < 


Kx,y,6 


Proof. We consider the expression 


1 Oe 
=— H (a, p)d 
. = | J tone 


me an 
1 s 
ae poe [ [te s+zcosary sina) da ds | . 
It (s?—6*) 
8 0 


Since the first integral under the modular sign is independent of a, it can be integrated with 
respect to a between the limits 0 and 27 and divided by 27, without changing its value: 


2x co 


1 1 
pu gecaectall Sz J | Beste cosasy sina) ds de 
a 


mu? 
00 


22 © 


- ras J | #e+a, s+zcos(a+m)+ y sin(a+m)) ds da 
ost 


00 
2a 


4 " $ 
mie | intaieree [ [Ae st2 cosaey sina) da | ds 
1 (s?—6?) 
i) 0 


From (6) it follows that 


3 2n 
1 
§&<-— | f [Hie s+2cosaty sina) da ds 
00 


10262 
co 2% 


| J fae s+zcosa+y sina)da ds 
782 
8 0 


2n 


co 
$ 
- lene [7 st+zcosat+y sina) da ds | ; 
(s?—62)'s 
8 6 


Replacing in the last formula the infinite limits of integration by 2R and evaluating the integral, 
we obtain (5). 
It follows from (4) and (5) that to determine the approximate value of f(x, v) with 


accuracy € it is sufficient that |F(a, p)—G(a, p)| do not exceed me? /16D. Here G is an 
approximate value of F, and F is the Radon transform of the function f. 
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We now determine the relative accuracy of J(a, p) and Jp. From the inequality 


T (a, p) J (a, p) 
—In 
Ip Jo 
where J(a, p) and Jp are approximate values of /(a, p) and Jp, it follows that (me?/46D is assumed 
to be a small quantity) if the relative errors of J(a, p) and J) do not exceed se?/32D, then the 


€-accuracy in the determination of f(x, y) will be attained. 


In 


In conclusion we note that if on the set of functions f(x, y) we specify the norm L, or 
L, and vanishes outside the circle (Kg 9 .R)» then the problem (the existence of a solution is 
assumed) will be incorrect just the same. The operator (2) will be a regularizing operator in this 
case also. 


Translated by J. Berry 
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OPTIMAL QUADRATURE FORMULAS FOR A SPHERE* 
V. A. GORDIN 
Moscow 
(Received 30 May 1975; revised 15 December 1975) 
THE PROBLEM of finding the statistically optimal quadrature formula for a sphere is posed. A 


system of linear algebraic equations satisfied by weights of the quadrature formula is written 
down. Two examples are given. Asymptotic estimates of the relative error are given. 


1. Let the linear functional 


i(g)= Je@av() (1) 
g2 


be defined in the space of functions on the two-dimensional sphere S?. In practice instead of the 
functional (1) we often consider a functional of the form 


N 
F(@)= Sails), where aj€C, 2;€S?. (2) 


j=t 


Sometimes the statistics of the functions y, on which the functionals (1) and (2) act, are 
known. This situation is encountered, for example, in meteorology [1], where S? is a natural 
object, and the statistics are vast. 


In this case the problem of the best choice of the coefficients can be given a probabalistic 
sense: the functional F best approximates the functional f, if on it there is attained the minimum 
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min M|f(@) — F(g) |*. 


ajeC 


The numbers a; and the points x; are chosen deterministically for known statistics. 
(Therefore, the method considered is not of the Monte Carlo type.) 


2. Let y be a homogeneous centred random field on S”, that is, for every point zeS? 
we have Mg(z)=0 and for any points z, y<=S? and any element of the group of rotations 
geSO (3) let the correlation function of this field B(z, y)=M{p(z)p(y)} be invariant with 


respect to g: B(z, y)=B(gz, gy). 


For such a field the following representation holds (see [2, 3]): 


l= co 
(z)= >. Y,!(a) Zim 


[In] <i, [=0 


where Y,,'(x) are spherical functions, and Z,,, are random quantities, and 
MZ;,=0, M{Zin, Zins} =611'bn1" ¥ (1) >=0. 


3. We consider the functionals h;; operating by the formula 


l= oo 
hes ( ¥ Ya'(2) 2) = Zui, 
In| <i, l=0 
and their linear shell L. In the linear space Z formula (3) specifies the structure of the 
pre-Hilbert space: 


lam co 


(g, h) = M{g(9)h(@)} = # Y (I) g(Yn')h(Yn!). 


[nj <l, l=0 


We denote by H the corresponding Hilbert space. It is finite-dimensional if and only if the 
measure of (J) is finite. 


4. From the theorem of the perpendicular we deduce the following theorem. 


Theorem 


Let 80, 81,-.., 8v©H and gj,...,8,, be linearly independent. In the subspace tight 
On £},---, 8,» the best approximation to gg in the sense of (4) is the functional gaa a5, 


where the numbers a; satisfy the system 


N 
»: (gi, 3) as = (Bo, 83), fH1,2,-00 N (5) 


i=t1t 


In the case where 81,---, 8v©H, go=H, but |(go, g)|<~ for all j2>1, the specification 
of (3) does not have an exact meaning and the solution of system (5) can be called a “weak” 
solution of problem (3). 
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In the case considered in paragraph 1, for the conditions of the theorem to be satisfied 
it is necessary and sufficient that 


[B(z, 2)|<~, | [fae nav@avy < @, 


82 s2 


Since these conditions are not always satisfied, they represent constraints on the random 
field y. For example, for white noise it is impossible to solve the problem of the optimal 
quadrature formula. 


5. The problem of the best approximation of functionals operating on a centred random 
field is a certain complication of the above. Let g=qotq:, where ¥p is a deterministic field, 
and y, is a homogeneous centred random field. Let |g;(qo)|<~ for all j>0, 81+-++s8N 
be linearly independent and j>1 exist such that gj(@o)*0. We will find the minimum (3) 


in the class of functionals g=ri a;gj, giving the unbiased estimate go: g(Po) =80(o). 


By the method of Lagrange multipliers we obtain a linear system satisfied by the optimal 
weights a; in this case: 
N 


Di es 8 ai + Agj(Qo) = (80, gj), j=1,2,...,M, 


i=t 


N 


» 4:8: (Po) = go(Po). 


i=1 


It is easy to see that with the assumptions made the system (6) is uniquely solvable. 


6. Let gj=8.,(z), dv(z)=dz and B(z, y)=exp{—[8(z, y)/a]*}, where 6(x, y) is the 
distance along the geodesic between the points z, y<=S? in radians; a correlation function of 
this form is characteristic, for example, for geopotential fields [1] for a = 0.205. The 
positive-definiteness of this function, that is, the positiveness of the coefficients of its expansion 
in Legendre polynomials, has been verified numerically. We consider a latitude-longitude 


template M,={2}}="° with step in latitude A@=36° and in longitude Ap=15°. The 
optimal weights a;, depending only on | @ |, and the relative error x of the quadrature formula 


(x= ||g—goll llgoll-*) are given in column A of Table 1. 


TABLE 1 
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The result obtained is easy to interpret: the points at latitudes +54° are noticeably correlated 
with their neighbours, and consequently carry less information than, for example, the polar 
points which are practically not correlated with other points: 


a, = Agg © ( [2 nav) [B(z, z) }-!. 


Therefore, the polar points in (2) occur with the greatest weight. 


The weights a; have a similar interpretation for the latitude-longitude template Z7.= 


cS ge with step in latitude A®=5.8° and in longitude Ag=9°, see column A of 


Table 2. 


TABLE 2 








0.00644 0.00655 0.01234 | 0.01249 
0.00154 0.00157 0.01325 | 0.01354 
0.00316 0.00322 0.01401 | 0.01430 
0.00467 0.00475 0.01464 | 0.01492 
0.00616 0.00629 0.01511 | 0.01541 
0.00757 0.00772 0.01543 | 0.01574 
0.00892 0.00910 0-01559 | 0.01590 
0.01016 0.01037 0.140 0.141 
0.01131 0.01155 0.160 




















7. Now let the situation of paragraph 5 hold; q =1, %, the same as in paragraph 6. The 
weights and the errors for ‘IZ; and Zz are given in column B of Tables 1 and 2 respectively. 


As a comparison the relative error x, of the quadrature formula with the template 777, 
and with the weights aj;=4m/1202. the same for all points is given. 


8. There are a number of papers (see, for example, [4, 5]), in which the problem of finding 
quadrature formulas for a sphere, exact on the spherical functions Y,,/(x) for I<M__ is solved. 
Such a problem is obtained, for example, if we put W(l)=1 for I1<M and W(l)=0 for 

I>M, Then dimH=(M+1), and, taking (M + 1)? independent functionals gj, we obtain 
a basis in H, with respect to which go¢H can be expanded with zero error. 


9. The problem of finding the minimum (3) can be generalized, by optimizing not only 
with respect to the coefficients a;, but also with respect to the templates {x}. It is then possible 
to consider both arbitrary templates with a given number of points, and also templates with 
geometrical constraints, for example, to search for the optimal template in the class of 
latitude-longitude templates or templates invariant with respect to some group. For dim H<~ 
such problems were solved in [4—6]. 
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10. We consider asymptotic estimates of the error. Let M={2}3=N and Uj= {reS?| 
Vi%j, O(2, 2;)<O(z, x:)}. The domains U; are bounded by a finite number of geometrical arcs 
and they are convex, and their closures U; cover the sphere: 


Let 


Pj = max 0(z, z;), p= max pj. 
xeU;j igjgNn 


Then W isa p-network on S2. 
We represent the functional gg as a sum 


N 


som) e' where (9) = | p(z) de. 
: 


j=1 1 
We note that by the triangular inequality the estimate 


N 


N N N 
| «- ) a8; - | ) “= ) 4383 |< ) lg?—ajgill, 


at 


holds, and this implies that 


N 


< ) min||g/—a;g;\|. 
ajeC 


oa 


The minimum on the right side of (7) is found by the theorem of the perpendicular: 
a;=(g’, gj) (gi, g)~', and the error is then 


min||gi—ajg jll= ((gi—ajg;, g4))"* = { J J B(x, y) dex dy 


ajeC 


Uy; Uy 
2 "a 
is [ faczne | (B (en =)1"* } 
Uy ’ 


We estimate (8) as 0. Let the correlation function B = B(@) be a Lipschitz function: 
a K>0 exists such that |B(6,)—B(02)|<K|6,;—02|. In this case we can estimate the 
integrands in (8): 


|B(z, y)-B(x, zj)|=|B(O(z, y))-B(O(z, z))|<K|O(z, y)—O(z, 2;) | 
<KO0(y, 2), 


IB(z, xj) —B(x;, xj)|<K10(2x, xj) —O(x;, x;) | —=KO(z, xj), 

2K: 25,5 9 

ffaenecer—forncnsefar| <xferforenere ne, © 
Uy Z 


u;, Uy Uy Uy Uj 
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2Kxnp;* 
| [ BG, x/ae - Bles2)) [az | <K [0(2,2)d2<——, (cont’d) 
Uy Uy 


UY 


2 Ya Ko 
{ f fae y) dz dy — [ fee nar | [B(2;, “y-"} < 2=| 
Uy 


Uy; Uy 


Substituting (9) into (7), we obtain 


N N 
: 4n?Kp;§ 
min go— Y0 ase: I< 3° ( 
a:=C 3 
j cose 


j=1 


) . <WN ; (10) 


By [7], there existsa C>0O such that forevery e>0 _ there exists a W={z — 


such that J is an e-mesh on S* and N<Ce-*. (In [7] the general result is given. In the 
case of a two-dimensional sphere we can take as J a latitude-longitude template. For the 
estimate of C for S? see [8].) Consequently the following theorem is proved. 


Theorem 


Let B(@) be a Lipschitz function with constant K. There exists a template with the number 
of rows not exceeding N and with an error k not exceeding 


8? gs? 


It is easy to see that for sufficiently small p the weights aj;=<gi, gj)((gj, g))-' are 
positive. 


The estimate (11) is not overlapped by an estimate in the Monte Carlo method [9], where 
the error is estimated on a fixed function on L?(S2). The realizations of the random field are 
formal series of spherical functions and are not bound to belong to L?(S) (see also paragraph 1). 


11. Let 


B(8)€ C?[0, x], K = max |Boe”(8)|. 
6e [0,2] 


In this case Bg’(0)=0 and the estimate of (11) can be improved by estimating the 
Taylor series for B: 
[0 (x, 23) + O(y, x5) }? 


|B(z, y) — B(aj, xj) |< K 9 ’ 





2 "fy np;° , 
{ ff Benazav - [ [acura | (BC, 201" } <— (13m) 
Uj 


Uy; Uy 


Instead of (10) and (11) the following estimates hold: 


~ Nap 
min |} g0 — S08; |<" #30" 


ajeC 


jamt 
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The assertions of paragraphs 10, 11 remain true if the Lipschitz property or differentiability 
of the function B is assumed only on the segment [0, €g] and integrability on the segment 
[0, 7]. 


12. Let the functional (2) act not on the field ¢ itself, but on a homogeneous random field 
n, homogeneously connected with y (noise is applied to the field y). In this case it is easy to 
write down the analog of system (2). 


13. Similar investigations can be carried out on the homogeneous space of any compact 
Lie group. The corresponding definitions and theorems on homogeneous random fields are given 


in [2, 3]. 


Translated by J. Berry. 
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EFFICIENT MONTE CARLO ALGORITHMS FOR EVALUATING 
THE CORRELATION CHARACTERISTICS OF CONDITIONAL 
MATHEMATICAL EXPECTATIONS* 


G. A. MIKHAILOV 
Novosibirsk 


(Received 10 March 1975) 


IN THE proposed algorithms only two samples of the conditional distribution are used for each 
value of a condition. It is shown that in this way it is possible to calculate efficiently the 
correlation characteristics of the solution of the particle transfer equation with random 
coefficients and the optimal parameters for the “splitting method”. 


We consider the random vector quantity 
E=E(@, 0) ={E1(@, 0), ..., En(o, o)}, 
where w is a random point of some abstract space (for example, the trajectory of a Markov 
chain); a joint probability distribution is specified for w and the random vector o={01, ..., Om} 


We have to estimate the correlation (otherwise — the “autocorrelation”) matrix for the random 
vector of the conditional mathematical expectations 


I=1I(0)=Ma(§|9) 
and the “mutual-correlation” moments for the vectors J and o, that is, the quantities 


KUIn, 1:)=Mol Un—Te*) 5-1) ], — -&, 74, 2,..., 0, 
KUIn, 0:)=Mol U,—-Jn*) (:- 0") ], R= 1, 2,..., m, t=, 2,..., 


Here o;*=Mo;, I,°=M/,, the subscript of the sign of the mathematical expectation defines the 
distribution to which it corresponds. 


The numerical characteristics indicated are easily computed by the Monte Carlo method, 
if for each sample value of o the vector /(o) is determined exactly. It will be shown below that 
this can be done by estimating /(o) at random with respect to one (for mutual-correlation 
moments) or with respect to two (for autocorrelation moments) values of w. For mutual- 
correlation moments this algorithm follows directly from the relation 


K[Ix, G1) =Mol n—Jn*) (0i—0:°) ]=Mol (01—0;") Mo (Ex—Ta°| 0) ] 
=M(o, «)[ (6i—0;°) (Ex—Jn*) J=K[0:, En], 


if the latter correlation moment exists. Let @!), w?) be independent sample values of w for 


) 
a specified value of o and bn = §,(o!*), 0), Ss = £;(@(?),0). Then 
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(4) (2) (1) (2) 
K[En 185) J= Mio o,0)[ (Ex —Ja") (5 —J;*)] 


(1) 
= Mo[Mo(Ex —Jn"| 6)Mo (§(?)-1;*| 0) = K[Jh, 15]. 


The representations obtained 


Kil. od= Ets 0), Kiluljl<Kih 8) 1 


may be called “randomized”’. It should be noted that randomization is widely used to construct 
efficient algorithms of the Monte Carlo method (see, for example, [1], 80, 94, 110, 116). 


By a similar method we can estimate the leading central moments of the vector J(c) of 
order NV by using NV sample values of w. Below we explain another method of calculating the 
autocorrelation moments, which may give more accurate estimates. 


We first consider the estimate of the variance DJ,. The equation 


DE,=D/,+ M,D (Ea| 0) . 


(@) (2) 
is well known. Let §’ =(Ex +€a )/2. It is obvious that 


DE,’ =D/,+0.5M.D (€x| 0). 


From (1) and (2) we deduce the formula 


D/n=2DEx’ —DE. 


It happens that relations (1) and (2) are valid for an arbitrary autocorrelation moment also, that 
is, 

K[Ex, &s]=KUn, Zi] +MoK (Ex, §i] 9), 

K[Ex’, &)J=ALn, 15] +0.5M.K (Ex, €5| 9). 


The last equations follow directly from the expression for the correlation moment and the 
properties of mathematical expectations. Accordingly, 


K[In, 1j]=2K(Ex’, €/]—-KLEn, §5]. (3) 


Therefore, all the correlation moments required can be estimated, by simulating for each value 
of o two values of w and calculating the statistical estimates of the corresponding moments for 
the vectors £ and £’. It is obvious that the moments of £ can be estimated by averaging the 
estimates of the corresponding mathematical expectations, obtained for w) and w2). 

Here the estimates of the moments for & and &’ will be strongly dependent and the accuracy of 
the calculations by formula (3) may be high. Two applications of the algorithms explained will 
be considered below. 


1. Calculation of the correlation characteristics of the solution of the transfer equation 
with random coefficients. Here we will suppose that o is the vector of the random coefficients 
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of the transfer equation (for example, the scattering or absorption coefficients), w is a random 
trajectory of a particle, and J, (0) is some functional of the solution of the transfer equation, 

for example the flow of particles at a given point. The relation ,(¢)=Mw,(o, o) means that 

£, is an unbiased estimate for the quantity /,. The relations obtained above show that for the 
estimation of the mutual and auto-correlation moments of the vector /(a) is is not at all 
necessary to solve the transfer equation exactly for each realization of the vector o; these 
estimates can be obtained by simulating for each realization of o only two trajectories. altogether; 
estimates of the leading moments can be obtained by simulating the corresponding number of 


trajectories. 


An approximate estimate of the moments sought can also be constructed by the standard 
method of linearization on the basis of the calculation of the corresponding derivatives by the 
Monte Carlo method. Algorithms for estimating the derivatives of the particle flow (intensity) from 
the coefficients of the transfer equation are presented and validated in [1]. We also note that 
in [2] approximate algorithms for estimating the moments of the intensity are constructed on the 
basis of perturbation theory for a statistically homogeneous model of the medium. 


There is a series of problems of transfer theory for whose complete solution it is necessary 
to calculate the statistical moments of the intensity. These are, for example, the important 
problems of the scattering of light in the atmosphere, whose optical characteristics are subjected 


to continual random variations. 


2. Estimation of the optimal parameters by the splitting method. Suppose it is necessary 
to calculate the mathematical expectation of the function §(, o), that is, the quantity 


[*=M,o, «) § (@, 0). 


For each realization of o it is possible to simulate n independent values of w and use the 


expression 


n 
It = ME(”) = Mn-! SEs 0). 


h=1 


The variance of the random quantity &) is expressed by the formula 


DE‘) =A,+A2/n, 


where A,=D/(c), 42=M,D(§“')|o). The time required to obtain one realization of ¢) on 


the computer is 
i") —=t, +n, 


where ¢, corresponds to the choice of 0, and f to the choice of w. The well known optimal 


value of the parameter 


yields a minimum value of the derivative «(~D§‘"). The direct estimation of the quantities 
A,, A}, ty and fy is difficult. However, it is possible to obtain statistical estimates of the 
variance and times for two values of the parameter n, and nz and solve the corresponding 
systems of linear equations, that is, use the equations 
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(nD§(") —nDE), 


No—Ny, 


nyne2 
(DE(") —DE(™)), 


ne—-n 


Not(™1) — ny t(™2) t(™2) — 7) 
ty => t, = ——————_- 
No-nN, Ne-ny, 





Here it is useful to correlate the sample of values &§(".), E(",). By a similar method we can 
calculate the parameters of a multiple splitting, for which 
A; Az Ap 


DE = Ay + — + +...+ ’ 
n) n(4)n(2) nYnl®) .. nf®) 





t=totnMtytnOn@t.+...tnYn) ... nt, 


k 


Here (see [3]) A; is the mean value of the conditional variance corresponding to the i-th 
splitting, and t; is the mean time of the realization of one experiment in the limits from the i-th 
to the (i + 1)-th splitting. It is useful to construct the simulation in such a way that the chain 
of splittings is as far as possible homogeneous and the following equations are satisfied: 
A; ty 
= a, = 6: bk 2 tie 
Ai+4 ti+s 
Here (see [3]) the optimal values of the n™) are identical: n()=...=n\)=n. After calculating 
DE andt for two values n=m, m2, we obtain for a and bD: 
DE(™ Ba was 
mi ee 
DE") i 
i=1 


k k 
t(™%) =$ 


= (1+ Pinot) (1+ Smee) : 
t(n2) 


i=1 i=1 


which are easily solved on a computer, after which the optimal value of n is determined by the 
formula n=(b/a)"*. 


Translated by J. Berry. 
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A SEARCH SCHEME FOR APPROXIMATE SOLUTIONS OF 
THE CONVEX PROGRAMMING PROBLEM* 


V. Yu. LEBEDEV 
Moscow 


(Received 29 May 1975; revised 29 September 1975) 


THE POSSIBILITY of using the weighted functional method to solve convex programming 
problems is proved. For problems with a strictly concave target function a modification of the 
method is proposed which in simple cases permits the difference between the value of the 
target function in the approximate solution generated and the actual optimum to be estimated. 


Many papers have been published devoted to methods of solving mathematical 
programming problems by using the technique of smooth penalty functions. In particular, 
much attention has been given to the method of the modified Lagrange function, as one of the 
most promising for solving convex problems [1, 2]. The absence of the need to increase the 
penalty parameter so as to improve the solution, and a number of other general features 
associate it with the method of weighted functional, described in [3, 4]. 


The idea of solving a conditional maximum problem by the successive unconditional 
minimization of a weighted functional with step-by-step improvement of the estimate of the 
target function optimum, converging up to the actual optimum, is apparently due to Morrison. 
In [5] he proposed a similar scheme for a problem with constraints of the equality type. It was 
generalized to the case of inequalities in [6]. A scheme based on the same idea, but considerably 
faster and differing in the formula for recalculating the estimates, was proposed in [3], where 
with certain assumptions its local convergence in the non-linear problem to the conditional 
extremum was proved. For the linear case the global convergence of this scheme after a finite 
number of steps and its stability to inaccuracies of the solution of the auxiliary problems were 
established in [4]. In the present paper the global convergence and stability of the scheme are 
proved for the convex case. 


We consider a problem of the following form: 


f(z)>max, (z)=0. (1) 


Here zeR*, f(x) is a scalar concave function in R”; yx) is a vector function each of whose 
m coordinates is a scalar function convex in R”. It is assumed that f(x), v(x) are continuously 
differentiable and that the set of possible points of problem (1) is bounded and not empty. 
Below the maximum value of the target function in problem (1) is denoted by f(x*). 


We introduce the so-called weighted functional: 
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{ ee a 
¥ (2, d)= — [d—f(2)]; + -s Vio (z) 4. 


imei 


Here d is a scalar quantity which henceforth we will call the estimate of the target function 

of problem (1), and by definition, z;—max{0, 2}. The function W(z, d) is convex and 
continuously differentiable with respect to its own variables. Because of the boundedness of the 
permissible set of problem (1), the aggregate of points of the form {r:@,?(z)<o} will be 
bounded for any non-negative values of o<+< (see [7]). This implies that for any d the 
function ‘W(z, d) attains a maximum with respect to x on the compact. This minimum will 
be positive if d>f(z*), and equal to zero otherwise. 


The following scheme of search for an approximate solution of problem (1) is proposed. 
Let there be given an initial estimate d,, a point x) and two accuracy parameters e,:>0, e.>0. 
A point x, is sought for which 
tas di) <¥ (zo, ds) 


and either 


Ix (x1, do)]E<e1(do—f(2z1)), W (x1, do) >€2, 


y (x4, do) SE. 


If (4) is satisfied, the process ceases. Otherwise the quantity d> is calculated: 


W (x4, dy) 
d, = dy —- —————-_ <. 
dy—f (21) 


and beginning with dj, x, the procedure is repeated. The method of choosing the points xz 
within the limits of the constraints (3), (4) is not specified here. In any case, any converent 
algorithm for minimizing the function W(z, d) with respect to x ensures the constructive 
possibility of finding them. 


The following theorem holds. 


Theorem 1 


For any values of 41, 20, €:>0, €2>0 the process (2)—(5) ceases after a finite number 
of steps, where the points of its stopping, obtained for fixed 2, d,>f(x*) and different 
€1, €> will as e1, €2+0 converge to a set of solutions of problem (1). 


Proof. We prove the first part of the assertion of the theorem indirectly. We assume that 
di, Zo, &:>0, e,>0 exist such that as a result of the realization of the process (2)—(5) an 
infinite sequence {dx}, {za}, will be obtained for which 
W (ray dp) > €2. 
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Formula (5) for the quantities d,,,, can be rewritten as follows: 


I lz? 
dros = f (Tx) - — = f (Zn) — (Pa, P(Zx)), 


+7 (Lr) 


= —— = 0. 


da—f (zr) 


df (zp) d 
io — Pr wt < €1(dy—f (Zp) ). 
dz dz E 





Ux! (22 dr) is = (da—f (an) | 


We introduce the vector 


af (zp) dp (zx) 
=. Ph ; 


dz dz 





Ack = 


It is obvious from (6) that ||Acalle<e,. 


We consider a linear programming problem of the form 


dg (zx) 


, 2) > max, 
dz 


(7) 


(z—zp) + P(z,) S 0. 
dz 


The vector p; is a permissible vector of the problem, dual to (7). Because of the convexity of the 
functions y,(x), any of the permissible vectors of problem (1) will satisfy the constraints of 
problem (6), that is, the permissible set of problem (6) and its dual are not empty. This implies 
that problem (6) has a solution. We denote it by x’. By duality 
(r. dg (zx) " - (2) ) cy df(zn) rae ). 
dz d 


x 


This inequality can only be strengthened if on its right side x’ is replaced by any solution x* of 
problem (1). Therefore, we have 


d d 
dz dz 


Also, the concavity of the function f(x) implies that 


df (xx) df (xx) ° 
’ k . 


v ) > 12")- Hen ( 


Taking into account (8), we obtain from this 





Short communications 


df (xx) 
d 


d 
( >» Os as ) > f(a") — f(a) + ( 
dz 


; a) — (Ac,, z*). 


Consequently, 





df (zx) dg (zr) 
x Pr ’ ns) 


dues = f(x) — (Drs @(tn)) = f (2*) + ( 
dz dz 


— (Acy, 2*) = f (xz*) + (Acy, 2,—2°). 


Since the points x, and x* belong to a bounded set {z:\lp(zx)lle2< (xo, ds:)}, a number 
M>0, exists, independent of the number k and such that = _||z,—z*||ze<M. Therefore, 


dnai=f (z*) — eM. 
Therefore the sequence  {d,} decreases monotonically and is underbounded. Therefore, 


lim (dy—dri1) = 0, 
kh~> co 

and since dx — da4i>[V (zx, dx) ]"*=0, it is obvious that 
lim y (rp, dp) = (0. 


kh oco 


This contradicts the initial supposition that (zx,, d,)>e2>0. The contradiction proves the 
first assertion of the theorem. The second assertion is proved in exactly the same way as in the 
linear case considered in [4]. 


Theorem 1 shows that, possessing an increased estimate of the optimum of problem (1) 
and having chosen sufficiently small values e,>0, e2>0, it is possible to rely on being able 
to obtain a better approximate solution by the scheme of (2)—(5). If there is no satisfactory 
prior estimate, it is possible to begin the search for a solution by maximization of the ordinary 
penalty function ; 
WY (x)= f(z)- > 947 (2), 


continuing the calculations until at the current point xg the condition § ||¥x’(zo)|lze<e.. is 
satisfied, after which, putting %=f(20)—@?(zo), we return to the scheme (2)—(5). The 
“extended” scheme thus obtained, beginning with an arbitrary point, will cease for any 

€,>0, e2>0 after a finite number of steps, giving excellent solutions for small values of €1, €5. 


It is obvious from the proof of Theorem 1 what will happen if in (3), (4) we put e,=e.=0, 
that is, solve the intermediate problems of minimizing the weighted functional completely. Then 
beginning with d,>f(z*) the scheme of (2)—(5) either finds the exact solution of problem 
(1) after a finite number of steps (as, for example, in the linear case [4]), or constructs a 
maximizing sequence {zx} anda sequence {d,}, converging to the optimum of problem (1) 
on the right. In the case where the function f(x) is strictly concave, the inequalities d.>f(zx*) 
can be ensured even without solving the intermediate problems exactly. For this in the scheme 
(2)—(5) it is necessary to replace condition (3) of the recalculation of the estimate d, by another 


of the following form: 
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d,>f (zx), 
WP.’ (er, dy) lle<ey min {llps (zx) lle, [lps (2x) lle* (da—-F(2n)) 17}, 


Vy (zi, dz) >e.>0. 
The following theorem holds. 


Theorem 2 


Let the function f(x) be twice continuously differentiable, let there exist a positive constant 
wu such that for any zeR", yeR" the inequality 


d*f (x) 
(vv) <~ Palate 
is satisfied, and at x* — the solution of problem (1) — let the derivative df(x*)/dx be non-zero. 
(For df(z*)/dz=0 and small values of e>0 _ the system (10) is inconsistent.) Then for 
sufficiently small values of e220 andany ‘xp, dy, e.>0, satisfying conditions (10), the 
value of dy 4, calculated by formula (5) will be not less than f(x*). 


Proof. We take any e>0, provided that 


: b ( p * 
= min J ————_——_-_,_ {[— ; 
{ Qlldf(z*)/delle’ \ 2 } 


is satisfied, and let <n, dx, 2>0 satisfy the inequalities (10). Then for dy, calculated by 
formula (5), it is possible taking into account the strict concavity of f(x), to write down an 
analog of the estimate (9) obtained in the proof of Theorem 1: 


dno i Sf (z*) +p llan—z*lle?2+ (Ac, Ta—-2*). 


lAcelle<e, min {5;, (5) "*}, On=llp+ (2x) Ile?/[da—f (x) ]. 


From this, allowing for (11), we obtain 


dros > f(r") + wllzx—2° lle? — €: min {5z, (6x) *} lza—z' lle 


> f(2*) + pll2x—2'llz — min si Hs) Viewed 
> f(x Zp—-2' lz — min J ———————-,_ {[——- - ’ 
ier iotled (ores. & :) } ro ae 


Moreover, by the concavity of f(x), 


laf (x*)/dz\l|z\|zx—2" |!e = (df (x*)/dz, z,—z*) 
> f (tx) — f(z") = f (tn) — nas + data — F(z") 


6 "a 
ee reer (6 ) } l2,—2'lle. 
idf(a*)/az’ \ 2 


Consequently, 





Short communications 


pllz,—2* |x? 


df (x*) : wdr 
> Spy|lz,—2' lle ( , + min ine otees ( 


z= E 


: p65, yp Yo 
a eer (— 4 | \ sna" 
QIldf(z*)/dzllz’ \ 2 


Q\\df(x*)/dzllz’ 


that is, 


which is what it was required to prove. 


The scheme (2), (10), (4), (5) will be finite for any 41, 0, €2>9 and any sufficiently 
small €, (by (12)), provided that the target function of problem (1) is strictly concave and 
df(z*)/dz#0. Then the intermediate problems can be solved by any convergent algorithm for 
the minimization of W(z,d) with respect to x. Therefore, if we are able efficiently to 
estimate the quantities ,, ||df(xz*)/dz||, downwards and upwards, respectively, (as, for example, 
in the problem of finding the minimum of the distance from a given point to a polyhedron), 
then, having d,>f(z*) and using the scheme (2), (10), (4), (5), it is possible to find an 
approximate solution for which the norm of the discrepancies of the constraints, and also the 
difference between the actual optimum and the value of the target function obtained do not 
exceed the square root of the chosen value of the parameter €>. 


Translated by J. Berry. 
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DUALITY IN MULTI-TARGET PROGRAMMING* 
V. D. NOGIN 
Leningrad 
(Received 14 March 1975) 


A DUAL problem of finite-dimensional multi-target programming is constructed for the most 
general assumptions on the vector functions to be maximized and the constraints. The concave 


case is considered in detail. 


Extremal problems of multi-target programming, in which the maximization is performed 
not with respect to one, but with respect to several target functions, have recently been 
intensively studied. A whole set of maximal (efficient, Pareto optimal, non-improvable) elements 
usually emerges as the solution of such problems. At the present time the theory of multi-target 
programming has not yet arrived at the same stage of development as the ordinary theory of 
mathematical programming, however it appears to contain fairly interesting and difficult 
mathematical problems [1—4] and is of interest from the point of view of applications in the 
most diverse fields: in economics, the theory of games, the theory of optimal decision making 
and in all problems of the choice of optimal solutions with incongruent criteria. 


It is known that one of the most important ideas of mathematical programming theory 
is the idea of duality, by which a correspondence is established between the original extremal 
problem and the dual problem closely associated with it. A joint study of both problems is 
fruitful both for the development of numerical algorithms and also for qualitative investigations 
of extremal problems. At the present time a comparatively detailed study has been made of 
duality in ordinary mathematical programming (see, for example, [5] ). In multi-target 
programming duality was considered only in [6, 7]. The dual problem of finding maximal 
elements in the completely linear case was first formulated in [6]. In [7] a dual problem was 
constructed from assumptions about concavity and differentiability of the target vector functions 


and constraints. 


In the present paper the dual problem is formulated for the most general assumptions by 
means of a vector Lagrange function and a certain non-transitive binary relation on a set of 
values of the Lagrange vector function. The present treatment of duality differs from the 
approach in [7] and includes as a particular case (when the target vector function degenerates 
into a scalar function) the approach developed in [8]. In the case where the vector function 
to be maximized and the constraints are concave (and not necessarily differentiable), conditions 
are given for which the set of solutions of the direct problem is identical with the set of 
solutions of the dual problem. It is shown that in the linear case the dual problem is similar 


to the dual problem considered in [6]. 


1. Let a=(a4,...,€m), b=(b4,....bm). We agree that 
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a=b—ai=)i, i=1, 2,..., m, 
a>b<+a=b nu a+b, 
a>b<>a;>bi, fuet 2.5.., 10; 


aSb+bPa. 


It is obvious that the relation ab __ is satisfied if and only if either a = b, or a subscript 
ie{1, 2,..., m}, is found such that a;>b; It should be noted that the binary relation = 


is not transitive for m>1. 


Definition 1. The finite element a®=A, AcE”, is called the maximal element of the 
set A, if the satisfaction for some a¢A_ of the inequality 22a° implies that a = a. 


Definition 2. The element <4 _ is called minimal, if —a®° is the maximal element 
of the set —A. 


It is easy to verify that the following definition is equivalent to definition 1. 


Definition 1’. A finite element «®<A_ is the maximal element of the set A if a°Sa. 


is satisfied for any a€A. 


We will suppose that the vector functions F(z)=(f;(z),..., fm(z)), G(z) =(gi(z),..., 
8x(z)) are defined on the set XcE*. We put 
D={reX|G(zr) =0,}, 
L(x, 4) =(Li(z, A),..., Lm(2, A)), 
L;(z, A) =f;(z)+(A, G(z)), j=1, 2,° coy M, 
A={VEE*|1=0;}, 


where (4, G(z)) denotes the scalar product of the vector \ and G(x). We will also consider that 
the vector Lagrange function L(x, ) is defined on the set XXA and D¥2. 


2. Direct problem 1. Find the maximal elements of the set 


P= P(z), 


xeX 


P(z)= (| {peE™|p S L(z,A)}. 


AGA 
The solutions of this problem are identical with the maximal elements of the set 
{ U P(z)}U{ U P(z)}. 


xeD xeX\D 


For 2xeX\D_ every component of the vector p¢?P(2) is not underbounded, therefore the 
set of solutions of the direct problem is identical with the set of maximal elements of the set 
U P(z). Also, as is easily verified, if p°<=P(x°) is a solution of the direct problem, 


xED 
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then p°=F(z°) forsome 2°<D. Therefore the direct problem I is equivalent to the 
direct problem II. 


Direct problem Il. Find the set of maximal elements of the set 


U F(z). (2) 


xeED 


In this form the direct problem has the standard form of the multi-target programming 
problem of the discovery of efficient elements. 


Dual problem 1. Find the minimal elements of the set 
H= U H(A), 


AGA 


H(A) = 1 fh = E™|hSL (x, d)}. 


Lemma | 


The relation aSb for a, b<=E™ is satisfied if and only if a vector p>Om, exists for 
which (uy, a)=(u, b) wu and 


ye = 1, 
t=1 


Proof. Necessity. Let 
E-={zeEE™|r<0n}. 


By the condition of the lemma the point a—b does not belong to the convex set E . It is easy 
to understand that the set E may be separated from the point a—b by some hyperplane 
(u, x) = 0, where 


(u,z)<O VreE-,  (p, a—b)=0. 


The first inequality and the definition of the set E imply p>0,. It is obvious that the 
vector uw can be so chosen that the second inequality in (4) is satisfied. 


Sufficiency. If (u, 2)=(u, 6), then by the inequality >On, the assumption a<b 
implies the contradiction: (u, a)<(p, »). 


By the lemma proved the dual problem I is equivalent to the dual problem II. 


Dual problem II. Find the minimal elements of the set (3), where 


H(A)= 1) U {heE™| (pu, h)=(u, F(z)) + (A, G(z))}, 


xeX pe M 


M= { UsE™|1>0m Yiu wii }. 


i=1 
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We will now explain the connection between the direct and dual problems. 


Lemma 2 
For any peP,heH the relation p=h. holds. 


Proof. We assume the contrary: for some pair of elements p%<=P, h°=H the inequality 
p’>h°®. holds. Let p%=P(zx°), h°=H(4°), where 2®=X, A°]A. By the definition of the 
sets P(x), H(\) we obtain 


L(x°, 0°) =p’Sh°SL (2°, 0°), 
which implies the contradiction: for some ie {1, 2,..., m} we obtain L,(z°, 4°)> L;(2°, 4°). 


The relation p=h holds both for the solutions p® of the direct and for the solutions 
h® of the dual problems, that is, p°=*°, which is the vector analog of the known inequality 
max min Z(z, 4)=minmaxZ(z, A) for the scalar Lagrange function. 


Corollary. The set of elements, which are simultaneously solutions of the direct and dual 
problems, is of the form PNH. The necessity of the assertion is obvious, and the sufficiency 
follows from Lemma 2 and the definition 1’ of the maximal (minimal) element. 


3. Here continuity and concavity of the vector functions F(x), G(x) on the convex set X 
will be assumed. 


Definition 3. The maximal element a° of the set A is said to be properly maximal, if a 
vector weM — exists such that the inequality (u, 2°)= (pu, a) is satisfied for all a<A. 


Remark 1. Usually by a properly maximal element is understood the value of the vector 
function F at a properly efficient point [2]. But, as shown in [2], when the assumptions of 
continuity and concavity are satisfied this concept is equivalent to the properly maximal element 
of definition 3. 


Lemma 3 


We suppose that a point x<X _ has been found such that g;(z)>0 for all non-linear 
functions g;. Then every properly maximal element of the set P is a solution of the dual problem. 


Proof. Let p® be a properly maximal element of the set P. Because of the maximality, 
p® = F(x°) for some 2%<p, and since p® is properly maximal, then by the results of [2], 
vectors p°<=M,A°=A can be found such that for all ze¢X the inequality 


(n°, F(z°)) =(p°, L(z, A°)). 


is satisfied. Accordingly, p%=PnH, and applying the corollary of Lemma 2, we obtain that p® 
is a solution of the dual problem. 


Theorem 1 


Let the regularity condition of Lemma 3 be satisfied. We will assume: a) every maximal 
element of the set P is properly maxima! or b) the set X is closed, there exists at least one 
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properly maximal element of the set P and the limit of any convergent sequence of solutions 
of the dual problem belongs to the set H. Then every solution of the direct problem is a 
solution of the dual problem. 


Proof. In the case when condition a) is satisfied Lemma 3 works. Let b) hold. It is 
obvious that the set D is convex and closed, thereby by the remark of [2], p. 623—624, for a 
concave and continuous vector function F(x) the set of maximal elements of P occurs in the 
closure of the set of properly maximal elements of P. The limit of any convergent sequence 
of properly maximal elements belongs to the closed set P, and thanks to Lemma 3 and the 
hypothesis of the theorem, belongs to the set H. Consequently, every solution of the direct 
problem is a solution of the dual problem. 


Remark 2. It is obvious that when m = 1 condition a) is necessarily satisfied. V. V. 
Podinovskii has proved that for m=21_ condition a) holds for linear F, G (this result was also 
published in [9]). The author has established (because of the unwieldiness of the proof it is 
impossible to include it here) the proper maximality of the maximal elements in the non-linear 
case, when F is simultaneously concave and pseudo-concave, and G is simultaneously 
quasi-concave and pseudo-convex. 


Theorem 2 


Let the regularity condition hold, the set Y be closed and let there exist at least one 
properly maximal element of the set P. Then every solution of the dual problem is a solution 
of the direct problem. 


Proof. We suppose that h°<H is a solution of the dual problem. Since h® is a minimal 
element of the set H, we have h®<int H. 


We prove the inclusion PcintH, where P denotes the complement of P to E™. Because 
of the assumptions of the theorem the set P is convex and closed. Hence, if h’<P, then as 
e>0 exists such that the sphere B (h’) of radius with centre at the point h’ does not intersect 
the set P. In this case the sets P and B(h’) may be strongly separate, that is, there exists a 
non-zero vector we” — such that 


(u,h)>a VheEB,(h’), (u, p)<a WpeP. 
By the definition of the set P we obtain from the second inequality ,2>Om. We may 
obviously consider that (4) is satisfied. Let p° be a properly maximal element of the set P and 
(u°, p?)=(u°, p) forall peP andforsome °=M. We consider the vector 
p°=op°t (1-) p, 
belonging to the set M for any w<(0, 1). Obviously, 


(u°, p°)=@(°, p®)+(1—@) (pM, P*), 
(n°, h)=o(p°, h)+(1-@) (u, A). 


As a consequence of the fact that the linear function (u°, h) is bounded on the set B(h’) and 


(u, h)>(p, p®) for all heB,(h’), the positive w can be taken so small that for all he=B,(h’) 


the inequality 
(u°, h) =(p®, p®). 
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is satisfied. By Lemma 3, p® is a solution of the dual problem, and hence p°<H. This 
together with the above inequality gives h’<int 1. 


By the inclusion proved, h®°=P NH, which together with the corollary to Lemma 2 
indicates the membership of the element h° is the set of solutions of the direct problem. 


We consider the linear case. Let X=E", F(x)=Cz, G(x)=b-—Azxz, where C and A are 
m X n and k X n matrices respectively. Using the fact of the coincidence in the linear case of 
the maximal and properly maximal elements, and of the necessary and sufficient conditions 
of proper maximality [2], we arrive at the following formulation of the dual problem: find 


the minimal elements of the set 
U U {heE™| (u, h) 2 (A, b)} 


AGA peM 


subject to the condition wC=AA. In this form the linear dual problem was formulated in [6]. 


Translated by J. Berry; 
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ALGORITHM FOR SOLVING THE LINEAR PROGRAMMING PROBLEM 
BY THE LOADED FUNCTIONAL METHOD* 


A. P. ABRAMOV and Yu. P. IVANILOV 
(Received 25 March 1976) 


A MODIFICATION of the loaded functional method proposed in [1, 2], using the trajectory 
of the local minima, is discussed. The algorithm finds the solution of the linear programming 
problem after a finite number of steps. 

Consider the linear programming problem in the form 


cr>max, Az=b. (1) 


Here ct, z=R", b=R™, and A is anm X n matrix. It is assumed that the vector c is non-zero 
and the set of solutions of problem (1) is not empty. The problem dual to (1) has the form 


pb>min, pA=c, p20, where p™=R™. (2) 


We take the arbitrary number w and construct the loaded functional (see [1, 2] ): 


(2, ©) =(@—c2) 42 + Yi (aie b1) 4% (3) 


where z,=max{0, z} is a truncation of the function z, and a; is the i-th row of the matrix A. 
For a fixed value of w the function w(z, w) is convex and continuously differentiable in R” , 
unbounded by zero. For any w there exists 


f(@) = min %p(z, @). (4) 
xe Rn 

This minimum equals zero if w does not exceed the optimal value w*=cz* of the functional 
of problem (1). Any point x at which a minimum is attained is a permissible solution of 
problem (1). But if @>cz*, then f(w) is a positive and monotonically increasing function. 
Moreover, because of the convexity of W(x, w) with respect to the ensemble of variables, the 
function f(w) is convex. For >cz* at any point x where loaded functional (3) attains a 
minimum the inequality 


(5) 
is satisfied. 


Therefore, problem (1) reduces to a search for the largest root w* of the function f(w). 
This root could be found by Newton’s method. However, the convergence of this method in the 
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neighborhood of w”* is small, since f(w) vanishes at o=@* together with its derivative. The 
method proposed in [3] gives at each iteration an improved value of the root of the function 
f(w) with a step twice as great as in Newton’s method, and in the neighborhood of w* this 
method is identical with the second-order Newton method. In both Newton’s method and the 
method proposed in [3] it is necessary at the k-th step of the algorithm to determine the value 
f(w*) of minimization of the function (z, w*), which for a large dimension of problem (1) 
is an extremely laborious operation. Below we present an algorithm for finding the largest 

root of the function f(w) using the trajectory of the local minima of x(w), where by definition 


p(z(@), ©) = min H(z, @), 
xe Rr (6) 


which at each step permits us to solve a problem of smaller dimension than problem (4). 


Let some number @>cz* and the vector #=z(@) be given. We assume that the 
point x is such that in some neighborhood of it the set of constraints Te{1,2,..., m} of 
problem (1), and only this one, is violated, that is, 


az—b;>0, ie!, ax—bi<0, i&T. (7) 


Then in this neighborhood the point x of the loaded functional (3) taking into account (5) and 
(7) is written as follows: 


p(x, ©) = (o— cap + Y> (a,c — b.)?. 
icI 


The necessary and sufficient condition for a minimum of the function tb(z, @) at the point 
x is the vanishing of the derivative p.’(%, @) at this point. We represent this condition in the 


form 


Condition (8) specifies explicitly the trajectory x(w), which is called the trajectory of local 
minima [4]. Therefore, instead of solving problem (4) we can solve the system of equations 


(w—cz)4¢+ = (ajz—b;), a; = 0. (9) 


We calculate the value of the function (xz, @) at the point (#+p£, wtp), where the 
n-dimensional vector & defines the direction of the shift in the space of the x-es. By choosing 
the negative number p sufficiently small in modulus at the point Z+p& one and only one 
set of constraints J will be violated. Performing elementary transformations, we obtain 


(E+ 96, +4) — Hz, 0) =2 [-@—eye+)) az — ba] of 
: ier 
+ 2(@—cz) p+ py (a)? + (4 —<)| p%, 


ieT 





from which by condition (9) we have 
p (ZF +p, © +p) — p (%, ©) =2w@ —c¥) p + [ Dyasr+ Gn cep 02, (10) 
ieT 


It follows from (10) that for negative values of p sufficiently small in modulus, the expression 
on the right in (10) is negative. Therefore, the problem arises of choosing the best direction &=R” 


such that 


>. (a,£)? + (4 —c&)? > min. (11) 
ieT 


This problem is equivalent to solving the system of equations 


= 
("e+ a. a,)&= .. 


We differentiate the system of equations (8) with respect to w: 
dx 
7 T Sacer 
(« e+), a; a,) > ell 
ieT 


Comparison of system (12) and (13) shows that the optimal value —=E is identical with 
dz/dw , the solution of system (13). 


Remark 1. We note that the derivative dz/dw does not always exist. For some values 
of w the left and right derivatives have different values (see below). 


Taking into account (12), relation (10) reduces to the form 


(E+E, +9) — (F,0) =2(@—c¥) pt (1—cE) p?. (14) 


We note that (10) and (14) imply that 4—c—>0. Multiplying the transposed system (12) for 
—=€ by £ and transforming, we have 


>. (aE)? = (1 — 08) c8. 


Consequently, O0<cE<1 and 0 <1. It is easy to show that 


[o +p —c (z+ pz)]? — @ — c#P = (1 — cE) Ay, 


y) [a;(% + op —)) (a,% — b,)? = cEAY, 


~ 


ieI ieT 


where Arp denotes the left side in(14), that is, the discrepancies of each of the two terms of 
the functional (z,@) are reduced for p<0. 
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We specify p=o—q@ _ in such a way that the point z—z+p£ violates the set of 
constraints / , and it alone. Substition of w and x into system (8) shows that the point x lies 
on the trajectory of local minima of the function (z, @). Accordingly, this trajectory is 
linear, the kinks corresponding to the vanishing of aj—b;. 


This algorithm realizes the approximation to the optimal point x* along the trajectory of 
local minima. We will give a formal description of it. 
Step 0. For the specified °>cz* calculate the initial point x®: 


tp (z°, @°) = min (z, w°). 


Step 1. Put k = 0. 
Step 2. Determine the set of violated constraints J‘ of problem (1) at the point x*. 


Step 3. At the point w* calculate the left derivative 
(dz/dw) | o*-o=&'. 


It can be determined by solving system (13) or the equivalent problem (11), which is of smaller 
dimension, than problem (4) for finding the minimum of the function p(z, ). 


Step 4. Put 


gh+t— ght (q@tti—@*) EF, 


k+i 
@*+! = max @; 
ieIk 


and @,*+! are determined from the relations 


a;[z*+ (w*#+4—w*) §*]—b:i=0, iel*, 


Step 5. Calculate the quantity (z*+!, wt). If p(z*+!, w*)>0, then putk=k + 
1 and pass to step 2, if p(2*t!, w*+!)=0, then put z*=z*+!, w*=@*+! and stop. 


We will investigate the algorithm. It follows from (9) that the row-vector ? whose i-th 
coordinate is defined by the relation 


is a permissible vector of the dual problem (2). If the vectors a, ic7,. are linearly-independent, 
then (12) and (15) imply that 

ark 

ie. (16) 


In this case all the aj&>0, ic7, therefore for <0 all the discrepancies [a;(#+ o&)—bi], 
i<T, decrease. It follows from (14) that p=—(@—cz#)/(1—c&) minimizes the right side 
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of (14), but (16) implies that p=—(a%—b,)/aiE, all the discrepancies [a,(#+p£)— bi] 
vanishing. Therefore, if at some stage the rows aj, i<7, are linearly independent, then the 
step p is realized and the new value 2z=Z+p£_ is the required optimal point x*. This obviously 
corresponds to the last step of the operation of the algorithm. 


We prove the convergence of the algorithm. It follows from the description that is 
operation consists of the sequential choice of points °>...>@*>.... On the segment 
[o*, @°] the function f(w) is non-negative, continuous and monotonically increasing, therefore 
the limit 
lim f(o*) = f(@) = 0. 


hoo 


exists. If f(0)=0, then G=max{w|f(@)=0}=o*. The attainment of the point w* after a 
finite number of iterations of the algorithm is guaranteed in this case by the fact that, as 
demonstrated, the last step of the operation of the algorithm is finite. We establish that the case 

f(o)>0 is impossible. From the continuity of the trajectory of local minima it follows that 
the point 


= = lim z(*) 


h— oo 


lies on the trajectory and corresponds to the value @, that is, z=z(0). The point x must 
belong to at least one of the hyperplanes of the constraints of problem (1), otherwise it would 
be within the segment [z*, z*+‘], corresponding to some k-th iteration of the algorithm, and 
the relation @*>@>w*+', would hold, which contradicts the condition w*>q for any 
k. For f(w)>0 at the point xX at least one of the constraints of problem (1) is violated, 
and at the next step of the algorithm some *<, will be chosen, which contradicts the 
definition of the point w as the limit of a sequence of points w*>q for any k. 


Remark 2. Everywhere above it has been assumed that the matrices on the left sides of 
the systems of linear equations (8) and (12) are non-singular. If at some iteration k this matrix 
is singular, then in the convex polyhedral set in which the set J‘ of constraints of problem (1) 
is violated, and only one, the trajectory of the local minima assumes the form of a cone. It is 
obvious that the solution of problem (1) in this case is non-unique. The calculations must be 
interrupted and the relative disposition of the hyperplanes of the functional and the set J 
investigated. 


Remark 3. A version of the algorithm is possible in which x**! is calculated by the 
formula z*+i=zr'+p*—", where p*=—(w@*—czx*) (1—c§") minimizes the right side in (14). 
With this choice of step the point x**+! may not lie on the trajectory of local minima. To 
calculate z(@*+!), where w*+!=q@"+p*, it is necessary to calculate the minimum of the 
function w(z, wt). Then xk+l1 is used as the initial approximation. 


Translated by J. Berry. 
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THE STABILITY AND ASYMPTOTIC ESTIMATION OF THE SOLUTION 
OF THE INVERSE PROBLEM WITH A SMALL PARAMETER* 


I. V. SIMONOV 
Moscow 


(Received 14 May 1975) 


THE TOPICS of the stability with respect to a parameter of the solution of a class of inverse 
problems, and the estimation of the error of the solutions of these problems are investigated. 
Theorems are proved which reduce the problems posed to similar direct problems. 


We first consider a physical problem of an illustrative nature. Suppose a rigid infinite plate 
is incident normally at high velocity on the surface of a half-space occupied by a solid strongly 
porous medium. The picture of the resulting plane one-dimensional motion of the medium is as 
follows: into the interior of the medium a strong shock wave propagates, behind the front of 
the wave the matter is in a state of relaxation. The behaviour of the medium will be described 
by the non-linear equation of a shock adiabatic and linearly-elastic relaxation. Then the problem 
of calculating the motion of the medium admits of the following equivalent mathematical 
description. The pressure p, the mass velocity u, and the specific volume v in the domain 

0<x<z2)(t), t>0, of the Lagrangian variables x, t must satisfy the equations of motion, of 
continuity and of linearly-elastic relaxation 
Ou Op Ou dv Po oe 
ous ee pager €(pi—p) = a?vy~? (v—4) 
and the boundary conditions: at the plate-medium surface of separation for z=0, t>0 in the 
form of the equation of motion of the plate 


Ou 
me Ps w(0, 0) = uo; 
ot 


at the shock wave front for z=20(t), t>0 (z(t) is the unknown boundary of the domain), 
in the form of the equation of the shock adiabatic and the general relation at the discontinuity 


P=pP-(V, Vo), Xo=vV(Voxo—U), p=Zou, Zo (0) =0. 


Here p; = p(x) and vy = v4(x) are the local extremal values of p and v attained at the 
front, e=a?/c?, where a=2(0) (the dot indicates the time derivative), c is the constant 
speed of sound at the front, mg, ug are the density of a unit of the surface of the plate and its 
initial velocity, vp is the initial value of the specific volume. 


Putting € = 0, it is possible to obtain the same problem for a medium with a stiff relief. 
We note that for e*0 this system of equations is of hyperbolic type, for € = 0 it is elliptic. 


We introduce the function 
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q(z) = lim v(z, ¢), 


too 


describing the remaining distribution of the specific volume and recorded in experiments. We 
suppose that this limit exists. We will call the problems of determining q for a given px for e*0 
and € = 0, problems | and 2 respectively. 


Problem 1 can be solved by a difference method. The equations of problem 2 are integrated, 
and q is determined as the solution of the equation [1] 


P+(q, Vo) =Mo"U9"/(Vo—Q) (z+mMo)?. 


The solution of the inverse problem of the determination of px for a known qg (problems 
3,4, for e#0 and e=O0, respectively) is physically interesting. These are typical problems 
of determining a function of state by indirect measurements. The solution of problem 4 is given 
in implicit form [1]: 


v=q(z), P=Mpo"Uo?/(Vo—Q) (2+ mo)?=F (gq, 2). 


The algorithm for solving problem 3 is unknown. Nevertheless, it is desirable to estimate 
the error of the solution of problem 4 due to simplifying the model of the medium. Below 
a mathematical investigation of the situation discussed is presented. A stability theorem is proved 
in which it is stated that for the stability of the solution of some class of inverse problems with 
respect to a parameter the uniform stability of the solution of the corresponding direct problems 
with respect to the same parameter is sufficient. The estimate of the error of the solution of the 
converse problem is obtained in terms of the norm of the error of the solution of the direct 
problem. Thereby the questions of the stability and estimation in the converse problem are 
reduced to the similar problems in the direct problem. 


1. We will start from the assumption that there exist several perturbed and unperturbed 
problems of determining a function q for a specified function p, and also the problems converse 
to them of determining p for q. 


Let A and Ag be perturbed and unperturbed operators (in general non-linear) transforming 
p into q, specified on some set P, and Q, and Qp be the domains of values of these operators 
respectively, where PE, QeCE2, QocE2, and £, and £4 are two linear normed spaces of 


functions. 


The perturbed operator Ag depends on the parameter € and is defined in the half-interval 
&=(0, e9]. The operator A is obtained if in Ag we formally put € = 0. We denote by 6 the 


difference 


B(e, p) =Aep—Aop (1.1) 


and we will say that the operator Ag approximates the operator A, on the set P, if for any 
function peP __ the following condition is satisfied: 


B(e,p)>0 as e0. (1.2) 


Here and overleaf convergence is understood in the strong sense as convergence in norm. 
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We suppose that each of the mappings A, and Ag is one-to-one, and let Ae“! — and 
Ay7! be inverse operators transforming q into p. We also suppose that the set Q=Q,NQ) — 
the intersection of Qp and the whole family of sets {Q., e=&};— is not empty, and we will 
use the notation 


Pe=Ae-'g,  pPo=Ao'g, a(8, 9)=pe—Po, FEQ. 


Therefore, if by means of each of the four operators introduced Az, Ao, Ae~*, Aom! 
the solution of one of the four problems is constructed (the direct perturbed, and unperturbed 
and the inverse perturbed and unperturbed, respectively), then the quantities a and 6 have the 
meaning of the errors of the solutions of the unperturbed problems, direct and inverse respectively. 
The approximation condition (1.2) denotes stability of the solution of the direct problem with 
respect to the parameter e. 


We prove the following stability theorem. 


Theorem | 


Let the operator Ag7! be continuous on Qp and f(e, p) tend uniformly to zero on the 
set Pas e+0. Then the operator Ag~! approximates the perturbed operator A-~! on the 


set O as e>0. 


Proof. To establish the connection between the quantities 8 and a we arrive at the 
following operator equation. By Eq. (1.1), Aepe—Aope=B(E, De)- 


From this, by mutual uniqueness Aepe=9, AoPe=9—B(€, Pe). 


Multiplying both sides of the last equation by Ag7!, we obtain 


Pe=Ao-*(Q—B), a (8, 9)=Ao~*[g—B(e, Pe) ]—Ao*g. 
Equations (1.3) imply that if the operator Ag~! is continuous and 
. B(e, pe)>O as e-0, 
then the statement of the theorem will be valid 


a(e,q)>0 as e-0, geQ. 


The condition of continuity of the operator Ag~! is stipulated in the formulation of 
Theorem 1. The validity of (1.4) simply follows from the condition of uniform convergence of 8. 


In the theorem it is asserted that for the stability of the solution of the inverse problem 
with respect to a parameter the uniform stability of the solution of the direct problem with 


respect to the same parameter is sufficient. 


We note that the condition that B tend uniformly to zero on the whole set P is too strong 
a requirement. Indeed, to prove the assertion (1.4) for each fixed q=Q_ it is sufficient that 
B(e, p)>0 as e-0 uniformly only on the one-parameter family of functions {p,=A.-'g. e=&}. 
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2. If the operator Ag~! is Frechét differentiable on the set Qo, then the right side of 
(1.3) can be represented in the form 


Ao~'(q—8) —Ao~'g=—BA’qtw(q, 8), (2.1) 


where A’ is a linear bounded operator, 


A’gllz,SCligllz, 9€Qo, 


and (gq, 8) satisfies the condition 


lim (|l@ (q, B) Ilz,IBllz.~*) = 0. 
Iipli--0 


The following theorem on estimation holds. 


Theorem 2 


Let the conditions of Theorem 1 be satisfied, and also let the operator Ay~! be Frechét 
differentiable on the set Qp and let the following conditions be satisfied: 


1) IIB(e, p)llz*0, peP, ed; 


2) for any fixed function p=P and any sequence of functions{p,}<€P, n=1, 2, 
such that Pn—>p as n>», the sequence {iB(e, pn)IlllB(e, p)ll-*} converges uniformly 
on the set @ tounity as n+, 


Then as e—0 the following asymptotic estimate holds (¢=Q, po=Ao~'g): 


lla(e, ¢) lesSCiI/B(e, po) llz., C,=C+o0(A1). 


Proof. From (1.3), (2.1) taking into account (1.4) we have 
lla(e, q) | SCIlB(e, pe) ll +lolg, B(e, pe) JI. 
We prove the validity of the representation 


IB (e, Pe) l=IB(e, Po) Il +x (é, q); 


where Ww, satisfies the condition 
lim[@; (€, ¢) iB (e, po) Il-*]= 0. 


£0 


It is required to prove that for any g<Q 


lim[ || (e, Pe) l]B(e, po) Il-!]= 4. 


e-0 


Let {e,}, n=1, 2,..., be some arbitrary infinitely small sequence: 


lim e, = 0. 


Then, by the stability condition assumed 


lim pe, = Po. 


nl-> oo 
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By condition 2) of Theorem 2, ||B(e, pe,)|l IIB(e, po) lim*+1 as n+ uniformly on the 
whole set &, and this means that for any 6>0 and any ee@ there exists a natural 
number m=m/(65)_ such that for any n>m_ the following inequality is satisfied: 


[IB (e, Pen) liB(e, po) l-*—4| <6. (2.5) 


In place of € in the inequality (2.5) it is possible to substitute €,, and it is then proved 
that for any infinitely small sequence {e,} and any function g<Q 


lim ([lB(@n. Pen) IW]B(en, Po) l-!) =1. 


n-> co 


This proves the validity of (2.4) and (2.3). 


Using (2.3) and (2.2), we obtain 


lla(e, g)liI<(C+y(e, g))IB(e, poll,  y=(l@llt+a@,)IBll-*+0 as e+, (2.6) 


which is what it was required to prove. 


3. The resulting obtained permit us to judge the stability of the solution of the inverse 
problem with a parameter and to estimate this solution by an indirect method without having 
recourse to the solution of the inverse perturbed problem or the corresponding linearized 
problem of small perturbations, as is done traditionally. For an estimate by formula (2.6) it 


is necessary to obtain the solution of the inverse unperturbed problem and estimate the error 
of the solution of the corresponding direct problem and the norm of the operator Ag~!. 


We must mention the strong constraints imposed on the properties of the operators in the 
proofs of the theorems. The problem considered in the introduction is an example where the 
conditions of the theorem are satisfied. Of course, not all of them are mathematically strictly 
verified: some of the conditions, such as the requirement; of uniform convergence of B and 
non-emptiness of the set Q, are checked by a numerical experiment [2], that is, on a discrete 
set of functions and points. The continuity of the inverse operator Ag~!, which is obviously 
the strongest requirement, in the case of some minimal and natural constraints on the set of 
functions px and q follows from the continuity of the linear-fractional function. We note that 
this condition can be replaced by the requirement of compactness of the set P [3]. 


The results of paragraphs 1, 2 imply an estimate of the difference of the solutions of 
problems 3, 4 in the form 


OF 


Pe Pe? 


| [qe(z) - g(z)], 
Oq | g=q(x) 


OF 
Pee — Poo & — [ze(q) — x(9)], 


Z| x=x(q) 
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where q,(x) is the solution of problem 1 for p+ = po and x(q) (x<(q)) is the function inverse 
to q(x) (q¢(x)). 


Translated by J. Berry 
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A NUMERICAL METHOD OF SOLVING THREE-DIMENSIONAL 
DIFFRACTION PROBLEMS* 


A. L. GAPONENKO 
Moscow 


(Received 26 November 1974; revised 7 March 1975) 


A MODIFICATION of the method of non-orthogonal series is proposed for solving 
three-dimensional diffraction problems. A justification of the method is given and numerical 
results are presented. 


Recently methods of solving three-dimensional diffraction problems with axial symmetry 
have been greatly developed. The presence of axial symmetry makes the problem essentially 
two-dimensional, which facilitates the search for the solution of problems of this type. One 
possible method of solving three-dimensional diffraction problems is the method of 
non-orthogonal series. However, it requires the summation of a large number of terms of the 
series, which considerably lengthens the calculations and leads to loss of accuracy. The 
modification of the method of non-orthogonal series proposed in this paper permits the accuracy 
to be increased for a fixed number of basis functions. 


1. Statement of the problem 


We consider the three-dimensional problem of the diffraction of a scalar field by a body of 
arbitrary shape. Leta plane wave uo=Cyexp (ikz). be incident on a bounded body V with 
boundary S. We suppose that the boundary S is a surface of Lyapunov type. The total field U 
can be represented in the form 


U=uotu, 


where up and wu are the incident and diffracted field respectively. We consider the Dirichlet 
problem. Then the diffracted field will be the solution of the following problem: 
Au(M)+k’u(M)=0, MeV., 


uls = —Coexp (ikz) ls, 


du/dr—iku=o(1/r) as r— >, 
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where V, is the domain exterior in relation to the surface S, and k=const is the 
characteristic of the domain V,. 


It is known, with the assumptions indicated above, that problem (1) has a unique solution 


u(M) [1]. 


2. The modified method of non-orthogonal series 


We consider an arbitrary closed surface S,, situated entirely inside the body V. We suppose 
that the surface S, is not resonant, that is, that the interior homogeneous Dirichlet problem for 
the equation Au+k?u=0 has only a trivial solution. We consider the system of functions 
W,(), proposed by V. D. Kupradze [2] : 


exp[ikR(M, M,) 
R(M, M») 





pn(M) = 


where the points M, form a countable, everywhere dense set on the surface S,, and R(M, M,,) 
is the distance between the points M and M,,. Using the method proposed in [2] we prove the 
following theorem. 


Theorem 1 


The system of functions {p,(M)} is linearly independent and complete in the space 
L,(S). 


Proof. We prove the linear independence of the functions wn(M@), m=1, 2, .... Let 
the system {tp,(M)} be linearly dependent, that is, let N and C,, i=1,2,...,N, where 
[Ci] +...+|Cw|*0, exist such that 


M 
b¥ Cipi(M)=0, MeS. 
i=1 


We consider the function N 
W(M) = di cova, Me’.. 


i=1 


It is easy to verify that the function W(M) equals zero everywhere outside the surface S,. But 
in a sufficiently small neighborhood of the point M; the function ¥; assumes a value arbitrarily 
great in absolute magnitude, while the other terms in the expression for W(M) are bounded. 
Hence we obtain C;=0, j=1, 2,...,.N, | which contradicts the assumption about the 
coefficients C;. Consequently, the system —{pn(M)}_ is linearly independent. 


To prove completeness it is sufficient to show that for any e>0 and for any function 
a(M)<L2(S) wecan find Nand C;, i=1, 2,..., N, such that 


N 
a.(M) — Yi cova <e. 


L2(S) 
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This statement is equivalent to the following: if a(M)<Z.(S) and 


feanv.anas =o, =4,2,..:, (2) 


Ss 


then a(M) = 0 almost everywhere on S. We first prove (2) for a continuous function B(M). Let 


j 8(M) tp,(M) dS = 0. 
S 
We consider the function 


8(M) exp[ikR(M, P) ] an 
R(M, P) 





F(p)= f 
Ss 
It is easy to show that F(P) = 0 in all space. This implies that 
(OF /dn) outward =(dF/dn) inward =0, 
However, by the property of the potential of a simple layer 


(OF /dn) outward — (dF /dn) inward =4n8(M). 


Consequently, 6(M)=0. on S. We have proved the closedness of {,(M)} in C(S) in the 
sense of the metric L4(S), consequently, we have proved the completeness of {p,(M)} in the 
sense of the metric L4(S). Since the set of continuous functions is everywhere dense in L>(S), 
we have thereby proved the completeness of the system {tpn} in the space L4(S). Theorem 
1 is proved completely. 


Theorem 2 
If the sequence {u"(M)} possesses the following properties: 


1) u”(M) satisfies in V, the Helmholtz equation and the radiation condition at infinity, 
oo ho aera 


2) \ju"(M)+Co exp (ikz) Ilzs)=6:70 as n>, 


and if V,' is an arbitrary closed domain, V.’<V,., and w is the exact solution of problem (1), 
then 


la—u"|locve 79 AS n>~, 


Proof. We construct the difference between the exact and the approximate solution 
w(M)=u—-u". Then 


0G 
w(M) = f = (M, P) w (P) aS p, 
P 
Ss 


(3) 


where G(M, P) is Green’s function of the exterior Dirichlet problem for the Helmholtz equation. 
Since in any closed region vy, situated within the domain V,, dG(MP)/dnp is bounded, then, 
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applying to (3) the Cauchy inequality, we obtain 
]w (M) llecve’)<Kbn—>0 aS n>~, 


Here K is a constant estimating the integral 


aG 2 
j cu, P) | dSp<K?, Mev’. 


Onp 
Theorem 2 is completely proved. 


Let {6,} be an arbitrary numerical sequence, 6,>0 as n+~, By Theorem 1, for 
any n we can find a number NV = Mn), points {M;"}, Mi*]S,, i=1, 2, ..., N, inducing 
corresponding functions pi"(M), i=1, 2, ..., N, and also coefficients {Ci"}, i=1, 2, ..., N, 


such that the inequality 


ay ee 


N 
| = C;"pi" (M) + Co exp (ikz) 


i=1 


a 


will be satisfied. We write 


ay" (M) = 2 Cam pit (M). 


i=t 


Since the sequence (4) defined above satisfies the conditions of Theorem 2, it satisfies 


llJu—uy"llecv,') 0 aS n+>~, 


Therefore the elements of the sequence (4) can be regarded as an approximate solution of 
problem (1). Therefore, to find an approximate solution of the original diffraction problem it is 


sufficient to solve the problem of minimizing the functional 


N 
| Secvwean +c exp (ikz) | (5) 
L2(S) 
i=1 


Usually a functional of the type (5) is minimized by searching for coefficients of an expansion for 
a fixed system of basis functions. In the modified method of non-orthogonal series it is proposed 
to minimize the functional (5) not only allowing for the variation of the coefficients C;, but also 
allowing for the variation of the basis functions themselves. In our case the basis functions are 
varied by varying the coordinates of the points M;(zi, yi, 2:). For a fixed number of basis 
functions this procedure permits us to find the optimal distribution of the points {Mj} fora 
given perturbation. If we note that the function exp[ikR(M, M;)]/R(M, M,) _ is a scalar field 
created by a point source situated at the point M;, then the procedure indicated can be given a 
certain physical interpretation. We seek a disposition of point sources within the body and a 
distribution of their intensities such that the total field of these sources will be the same as the 


diffraction of a plane wave. 


It is also possible to give a mathematical interpretation of the algorithm for searching for 
the coefficients of the expansion C; and the simultaneous search for the coordinates of the 
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points M;. In the solution of the problem of approximating a given function ug by a segment 


N 
of the series vow n, Where {,} is a complete system of functions, the following 


n=1 


situation may arise. The hyperplane formed by the elements tpn, n=1, 2, ..., N, will be 
almost orthogonal to the element uo. It is impossible to obtain a better approximation to the 
element up by means of the given N functions. In the modified method of non-orthogonal series 
we seek N functions {pn}, such that the norm of the projection of the element up on the 
hyperplane formed by the given N elements {pn}, is a maximum. 


In the solution of diffraction problems the final aim of the investigation is often to obtain 
radiation patterns. We introduce a system of spherical coordinates with centre at the point OeV. 
Let (r, 0, y) be the coordinates of the point M. Then the field u(M) in the far zone can be written 


in the form exp (ikr) 4 
u(M) = ———— (0, 9) + “| —} 
y r 


where D(@, vy) is the field radiation pattern. In the case where an approximate solution 


uw (il) = Y Cipi(M), 


i=1 


is known, the corresponding approximation to the actual radiation pattern has the form 


N 


Dy (0, ®) “> C, exp[—ik(sin 6 cos pr,+sin 6 sin gyn+cos 8z,) J. 


n=1 


Here X,, Vy Zn are the Cartesian coordinates of the point M,. 


Remark 1. The case of the perturbation of a plane wave was considered only for 
definiteness. The description of the method is unchanged if a perturbation of any other form 
is considered. 


Remark 2. The method transfers without difficulty to the case of the second and other 
boundary value problems. 


3. Numerical realization of the method 


Thus, the original problem (1) has been reduced to the problem of finding the minimum 
of a function of 5N variables: 


F (Re C;, Im C;j, xj, yj, 23) 


N 
exp[ikR(M, M;)] 2 
=j y' C; pl ( ; +Co exp (ikz) dS x. 
R(M, M;) 


8 j=1 





Various methods of minimizing functions of many variables can be used to solve this problem. 
For the numerical realization the method of conjugate gradients (the Fletcher—Reeves method 
[3] ) was used. This method was chosen as the most appropriate to the specific nature of the 
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given problem. To illustrate the operation of the algorithm we give the results of a calculation 
of the following two simulated cases. 


Case 1. On an ellipsoid with axes, a = 1, b = 2, c = 3 there is incident from the direction 
of the greatest axis a plane wave, with wave number k = 0.6, the amplitude of the incident wave 


Co = 10. 
Case 2. On a sphere of radius r = 1 there is incident a plane wave, 
k= 1, Co = 10. 


The calculations were performed on the BESM-6 computer for twenty basis functions. In 
the case of diffraction by the sphere after 65 steps the Fletcher—Reeves method succeeded in 
attaining a decrease of the function (5) from 417 to 0.44. The value of the function (5) can be 
interpreted as the discrepancy in L>(S). The field at points of the surface S was calculated by 
Eq. (6). The maximum value of =|uyt+uo|? _—_-can be interpreted as the discrepancy in C. If 
the value of the discrepancy is referred to the square of the amplitude of the incident field, then 
we obtain a percentage expression. In the solution obtained for the sphere the values of the 
discrepancies in L, and C were 0.44 and 1.4% respectively. The results obtained for diffraction 
by the sphere were the same as those published in [4]. 


In the case of diffraction by the ellipsoid after 30 steps the function (5) was reduced from 
33.18 to 0.39. The values of the discrepancies in Ly and C amounted to 0.39 and 2.48% 
respectively. From the data given there is obviously a difference in the number of steps, and 
consequently also in the solution time of the problem for the sphere and for the ellipsoid. This 
is explained by the fact that for the ellipsoid a more favourable zero approximation was chosen, 


taking into account some of the regularities in the solution of the diffraction problem for the 
sphere. It must be mentioned that the function (5) is not convex, so that there apparently 
exists a set of local minima, among which one or more are global. However, for the solution of 
the problem it is unimportant whether we begin in the neighborhood of a global or in the 
neighborhood of a local minimum, since the accuracy of the solution depends only on the value 


of the discrepancy in L>(s). 
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Figure 1 shows the radiation pattern of the field diffracted by the sphere after 25 (curve 1), 
35 (curve 2) and 65 (curve 3) steps with values of the discrepancies in Ly 3.3, 1.08 and 0.44% 
respectively. It should be mentioned that in the solution of the problem of diffraction by the 
sphere (in this case the exact solution is symmetrical with respect to the angle y) a sufficiently 


good symmetry with respect to the angle y was obtained. The radiation pattern obtained 
deviates from symmetry by not more than 0.16%. 


Figure 2 shows the radiation pattern of the field diffracted by the ellipsoid. The sections 
of the radiation pattern are shown in the OXZ plane (continuous line) and the OYZ plane 
(dashed line). 


In the process of solving the above two problems, and also a number of other problems of 
the diffraction of a plane wave by three-dimensional bodies, some tendencies in the behaviour 
of the coefficients and in the disposition of the internal sources could be distinguished. The 
internal sources are so arranged that relative to the geometrical centre of the body on which 
diffraction occurs, they are displaced toward the front of the incident wave. For the sources 
furthest from the wave front ReC; and ImC; are negative, at those close to the wave front they 
are positive. As the sources approach the wave front, ReC; and ImC; increase monotonically. If 
we choose a zeroth approximation allowing for the given behaviour, the process of convergence of 


the method can be improved. 
The author thanks V. V. Kravtsov for his guidance and interest. 


Translated by J. Berry. 
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SOLUTION OF THE INVERSE PROBLEM OF THE DISPERSION OF 
AN ELECTROMAGNETIC PULSE IN A CONDUCTING MEDIUM* 


V. V. YANKOV 


Moscow 


(Received 3 March: 975; revised 27 June 1975) 


A SIMPLE analytic expression is obtained for the initial shape of a plane electromagnetic wave 
pulse in terms of the shape acquired by the pulse during propagation in a homogeneous conducting 
medium, in the form of the exact solution of an integral equation of the first kind of the 


convolution type. 


It is well-known that the propagation of an electromagnetic perturbation in a homogeneous 
isotropic conducting medium is described by a partial differential equation of the hyperbolic 


type of the form of the “telegraph equation” 


eu 02u 4nop du 
u= —, 


OF ce 08 


where u is any of the components of the electromagnetic field F,, Ey, E, or Hy, H,,, H, in an 
arbitrary Cartesian coordinate system, € and u are the dielectric constant and magnetic 
permeability independent of frequency, o is the constant electrical conductivity of the medium, 
and c is the velocity of light in a vacuum (see, for example, [1, 2]). Here both in the absence 

of dispersion of the dielectric constant, and also in the frequently encountered case where it is 
permissible to neglect the effect on the spreading of the pulse of displacement currents in 
comparison with the conduction current (see, for example, [3]) the change in shape of the pulse 
u(x, y, Z, t) (without allowing for the time delay of the pulse as a whole) satisfies the parabolic 


heat-conduction equation 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 273-276, 1977. 
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of a one-dimensional boundary value problem for a half-bounded space z>0 without initial 
conditions t>-—« and with the boundary condition 


u (0, t) =f (2) 


for any bounded piecewise-continuous function f(t) has the form (see, for example, [2] ) 


t 
u(z, =f j (x) K(x, t—1) dt, 


where the binary source function 


In the physical problem considered the impulse function (4) describes a deformation 
proportional to the propagation in the conducting medium of a perturbation, defined as a 
5-function f(t)=6(t) and actuated in the plane x = 0. 


We are interested in the problem of determining the original shape of the pulse f(t) from the 
shape u(x, t) distorted by transient phenomena in a homogeneous transmission line with a pulse 
characteristic of the form (4). Previously (see, for example, [4, 5] ) it was discussed exclusively from 
the point of view of the possibility of finding by using a computer the solution directly of the 
convolution type integral equation (3) by one of the general methods for the approximate solution 
of ill-posed problems — a regularization method. 


Meanwhile it will be shown below that the efficient use of some general analytic properties 
of the solutions of the heat-conduction equation (1) together with the specific integral relation 


for the kernel (4) of the integral equation. 
t 


[xs 1) K (a2, t—1) dt = K(14+2», t) 
0 


(see, for example, [6] ) leads to a characteristic inversion of formula (3), which is in principle 
equivalent to obtaining the exact solution of the integral equation (3). 


Therefore, from the mathematical aspect the determination of the shape of the 
electromagnetic pulse f(t) at the plane boundary x = 0 of the homogeneous half-space z>0 from 
the shape of the same pulse u(x, f) at a distance z>0 from the boundary belongs, in general, 
to the category of so-called inverse problems (see, for example, [7] ). However, unlike the usual 
formulation of similar problems, for which their ill-posedness is stipulated at the very beginning 
and in some way or other is immediately introduced into the scheme for the approximate solution 
of the integral equation (3), we confine ourselves to only an adequate representation of the 
solution f(t) of the given inverse problem in the most general analytic form, that is, essentially, to 
the reduction of one ill-posed problem to another simpler one. 


In other words, we here want to find the exact solution of the integral equation (3) on the set 
F of piecewise-continuous functions f(t)<F only for such initial functions u(x, f) as 
necessarily belong to the set u(z, t)<=AF, where AF is the image of the set F in the mapping of 
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the latter performed by the integral operator 


For this purpose we use primarily the fact that the solutions u(x, t) of Eq. (1) are analytic functions 
of the variable x [8] , which can be expanded in an infinite Taylor power series in powers of the 
difference xg — x in the neighborhood of any point x>0: 


Ip—-x Iu(z,t Xo—z)* Cu(z,t 
u(Xo, t) = u(a, t) + ; ye. wale (2, t) es 
1! Ox ! Ox? 





if |zo—z|<-. Then, in particular, in the limit as 2.>0 the expansion 


; x ou(z, t) z* 0*u(z;t) 
lim u(zo, t) = u(z, t)- — toe 
x<g-+-0 1! Ox 2! Ox? 





holds, which in consequence of the boundary condition (2), understood in the sense of the limiting 


value 
u(0, t) = lim u(z, t) = f(t), 


x0 
must be identical with the unknown function f(f) at all points of its continuity 
xz du(z, t) x* Ola, t) 


Hp Se +— ———..., (7) 


Ox 2! Ox? 





since the function (3) as has the limiting value f(t), only if the function f(t) is continuous 
at the point t (see, for example, [6] ). Then, we expand by (6) in a Taylor series with centre at the 
point x also the auxiliary function u(2x, t), equal on the other hand, by (3), (5), to the convolution 
of the original function u(x, t) with the impulse function (4), namely 


t 
0 ee. Om 
J co dete: igi os. MN (8) 


u(z, t)K (a, t—t) dt = u(za, t) + — 
4! Ox 2! Ox? 


—oco 


We then add separately the left and right sides of formulas (7), (8), after which we solve the 
result for f(t): 





) " t 
1o=2/ u(z,t)+ (on : | = u(a, t) K(x, t—t) dt. 


n=1 — oo 


We now consider the general equation for u(x, f) 


0?" u(z, t) 1 ula d 


Ox?” a*" ot” 
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obtained by differentiation nm — 1 times with respect to time of Eq. (1) with subsequent change 

of the order of differentiation on the left side with respect to ¢ and with respect to x and taking 
(1) into account, since the solution (3) of Eq. (1) is analytic in x and has continuous derivatives of 
all orders with respect tot for O0<x<~ [8]. Finally, replacing under the summation sign in (9) 
the spatial derivatives by time derivatives, using (10), we arrive at a simple analytic expression for 


the solution of the inverse problem posed: 


1 
f(j=2 | u(z, t)+ 1 Meters 
zn)! 


n=1 


t 
~ f u(z, t) K(x, t—t) dt. 


With the assumptions made about the class of functions f(t), the uniqueness of the solution 
(11) of the integral equation (3) within the limits of the statement of the problem formulated 
above, follows directly from the proposed method of sequential derivation of the inverse formula 


(11). 


In particular, formula (11) gives the correct solution of the inverse problem in the case of a 
monochromatic boundary function of time f(t)=const-exp (iwt) with an arbitrary frequency 
«, when the exact connection between the functions u(x, f) and f(t) — u(z, t) exp [ (z/a) (iw)"*] 
is known from the solution of the corresponding direct problem in complex form (see, for example, 


[2]) 


zx 
u(z, t)= const exp | — — (iw)'? + iot| ; 
a 


which is easily verified by substitution of (12) into (11) and summation of the series. 


Therefore, the problem of reconstructing the original undistorted shape f({‘) of an 
electromagnetic pulse, distorted in traversing the path x in a homogeneous conducting medium, is 
rigorously reduced to the operations of differentiation and integration with respect to time of the 
observed shape u(x, t) of the pulse. In the case of the performance of an approximate calculation 
of the pulse shape with a given error, the number of terms of the series (11) to be calculated will 
naturally depend on the value of the expansion parameter x/a and on the time characteristics of the 


pulse investigated. 


Translated by J. Berry. 
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CONVERGENCE OF NEWTON’S ITERATIVE METHOD FOR SOLVING 
GAS-DYNAMIC DIFFERENCE EQUATIONS* 


Yu. P. POPOV and E. A. SAMARSKAYA 
Moscow 


(Received 19 February 1976) 


THE CONVERGENCE conditions of Newton’s iterative method applied to the solution of implicit 
difference schemes for one-dimensional non-stationary gas-dynamic equations in Lagrangian 

mass coordinates are investigated. The results obtained for the adiabatic case, taking into account 
linear pseudoviscosity, are compared with previously known conditions for isothermal flows in the 


absence of viscosity. 


Iterative methods are usually used in gas-dynamic problems to solve implicit difference 
schemes, consisting of systems of non-linear algebraic equations. The numerical solution of a 
system of one-dimensional non-stationary gas-dynamic equations in Lagrangian mass coordinates 
by means of Newton’s iterative method is described in [1, 2]. A theoretical analysis, and also the 
results of calculations, testify to the fact that in this case Newton’s method possesses certain 
advantages over other iterative methods, for example the “explicit iteration” method, since it 
permits the use of coarse meshes with a comparatively large time step. Estimates of the convergence 
of Newton’s iterative process were made in [1, 2] for the isothermal case without allowing for 


pseudoviscosity. 


In the present paper these estimates are generalized for the adiabatic case in the presence of 


linear viscosity. 


1. The system of equations of gas dynamics, describing the one-dimensional plane non-stationary 
flow of a gas in Lagrangian mass variables for the adiabatic case, can be written in the form [2, 3] 





*Zh. vychisl. Mat. mat. Fiz., 17, 1, 276—280, 1977. 
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Here t is the time,x is the Eulerian variable, 7 is the specific volume, s, ds=n~‘dz, is the 
Lagrangian mass variable, v, p, €, T are respectively the velocity, pressure, internal energy and 
temperature of the gas, w is the viscosity, g is the so-called total pressure, and the time 
derivative is Lagrangian. The last two relations in (1) are the thermodynamic equations of state. 


Equations (1) are solved in some domain Q={0<s<M, t>0}, on whose boundaries for 
s = 0 ands =M boundary conditions are specified, for example, the laws of variation of velocity 


or pressure with time. 


For definiteness we will consider an ideal gas with the equations of state 


p=RT/n, e=apyn, a=1/(y—1), 


and with the linear viscosity 
dv/ds<0, 
dv/ds>=0. 


To construct a difference scheme approximating the system of differential equations (1), we 
introduce in the domain Q a mesh, uniform for simplicity, 


@nr={ (si, tj), i=0, 1,...,N, 7=0, 1, 2,...5 Siga—sith, they=tj+T}. 


To the nodes (s;, ¢;) (“integral points”) of the mesh Grr —_we refer the functions z=2/, v=v/, 


j j j 
and to the “‘half-integral points” (s;41,, t;) we refer the functions P=Pi+'% N=Ni+% &=Fi+%y 


j j = 
T=Ti4y,, O=@i4%, =Sienw Where si4y,—s;th/2. 


The completely conservative difference scheme approximating the system of equations (1) 
in the case of an ideal gas, has the form [1, 2, 4] 


(3) <p (0.5) iat 
Of Es‘ ts Zy=v- ’ RLV: ’ C8" ty 


g=p+o, @=—vv,/n, p=RT/n, e=apyn. 


The parameter O<o<1 __ is arbitrary. 
The scheme (2) is written in the indexless notation [1, 2, 4, 5] 
y=yi3, y=y it, yO =oy+ (1—o) y, 


yr=(y—y)/t, — Ys=(Yin1—Yi) /h, Ys= (Yi—Yi-1) /h. 


Transforming the last four equations in (2), we can rewrite this system of equations in the form 


(s) (0.5) n 
Ui=—83 , rp=v"-5), Nr=v, , er= —g' NE, (2) 


&—agyn—avv,=0. 
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For o = 0 the system of difference equations (2) or (2’) is solved explicitly, but one thereby 
obtains a scheme conditionally stable for an extremely strict constraint on the time step of the 
mesh. In the acoustic approximation the stability condition has the form t<kh*, where k is 
some constant [1,2]. For o20.5 the difference scheme (2) is unconditionally stable, but 
in this case iterative methods must be used to solve it. 


2. The application of Newton’s method to the system of non-linear equations (2) for finding 
the unknown values of the mesh functions v, z, p, n, e, 7, @, g on the top time layer 
tz; for 020.5 leads to the equations 
dvtoég;=—fi, 62—0.5tbv=—fr,  52,—dn=—fs, 
6et+gSnt+ondbg=—f,, Se—andg—agdn—avév,=—f;. 
Here dy is the difference in the values of the mesh function y in the adjacent (k + 1)-th and k-th 
intervals: 


Sy Sy lhH Na ylAH11 yi, (5) 


Here all the unknown increments dy have the iteration number k + 1, and the coefficients of the 
equations g‘), m:, g, 1 and the right sides fp, p=1, 2, ..., 5, are calculated at the lower 
k-th iteration and are regarded as known. As the initial “zeroth” iteration y!] the preceding time 


layer y!J=y! can be used. 


After the elimination of all the unknown functions, except dy, the system of linear equations 
(4) reduces at each mesh node to a three-point equation 


Adv.-.—C0:tBbviui——F;, im1,2,..., NA, (6) 


whose coefficients A;, B;, C; and the right side F; depend only on the values of the mesh functions 
at the k-th iteration and j-th time layer. 


Equations (6) are solved at each iteration by pivotal condensation [2, 5], the iterations are 
continued until some stability condition is satisfied, for example, the increments dy at all the mesh 


nodes become fairly small. 


3. We study the convergence of the Newtonian iterative process described above. We subtract 
from each equation of the system (4) the corresponding equation of the system (2’). As a result we 
obtain the following system of linear equations: 


{k+1] : 
Avih+1l+o7 Ag= == Axlk+!1-0.57 Avi&kt+1=0, 


[kh+1] Hes 
Ags ~Anlkt+t—0, 


Ael*+1+ gy Aglit+tl+ gg An'tt+—oAnAgt*+ gniAgltt!) 


+({—o) giAnit+=0, 


[k+1] 


Ael*+tI—agn lhl Aglt+t1]—agthl Ant*+'1ia An!*) Agt*l_ay Av, 0. 
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Here Ay!*J=ylkl—yi+! is the difference between the value of the mesh function at the k-th 
iteration and the exact solution of the difference problem. We note that this notation differs from 
the notation of (5), where the difference is taken between adjacent iterations. 


The system (7) is more suitable for theoretical analysis. Eliminating from (7) all the functions 
Ay'*+4], except Ag!*+4], we arrive at the equation 


(k] [k+1] {k] [k+14] [k+1] [k] 


A; Zi B; (Zi44 +2;-1 )= F; ‘ i=1,2,...,N—1, (8) 


[rh] [h] [k] 
=Agi , ; =(d+a)yni —oniit+2B; , 


i : [kh] 
= o—| seal ((o+a) gi +(1-0)80) +av | ? 
h a 


{k] [k] 
= (o+a)Ani Zi 


Obviously, B;\*1>0. We require that the following inequality be satisfied: 


[x] (k] (h] 
—2B; =(o0+a)Hi — oni? > 0. 


We note that the inequality Ai!*!>0 _ is then also satisfied. 


For definiteness we restrict ourselves to the problem in which on the boundaries of the 
domain 2 the variation of pressure with time is specified. Taking into account also the fact that 


: ‘ : i. re j+i j+i 
at the boundary points the pseudo-viscosity is assumed to be “zeroed” © =@n =O (see 


[1]), we have for Eqs. (8) the boundary conditions a as SO ee inhomogeneous 


equation (8) with the homogeneous boundary conditions to satisfy the inequality (9) we use the 
maximum principle [2, 5], which in particular implies that 


Fir) (o+a) An{*) zlAl 
oa | < gulle!le, 
D™) ie (o+a)y"!—on IIc 











(0+2) (ni*1—n) -| nt 
c | n!*1—on/(o+a) .: 


lyllc = max| yi |, i = 
i (o+a)yH*1—on 


Obviously, Newton’s iterative process converges if g.x<41, and condition (9), which can be 
written in the form 


‘ 0 
. ieee n 
o+a 
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is also satisfied. The inequality 9,<1 is satisfied if for ali=1,2,...,N 


ni“—n 
<1. 





< 
nil*J—oni/(o+a) 


holds. We consider two possibilities. 


First, corresponding to the process of rarefaction of the gas, when #>n. Then in (12) the 
condition on the right is satisfied automatically, and the one on the left leads to the inequality 


which simultaneously guarantees the satisfaction of condition (11). 


In the contrary case (ij<n), corresponding to compression of the gas, condition (12) is 
satisfied if the inequality (13) holds, and also the inequality 


aA oO 
n> ee (14) 


condition (11) being also satisfied. 


It is reasonable to assume, and this is confirmed by calculations, that to satisfy the inequality 
(13) for any k it is sufficient to require its satisfaction at the zeroth iteration k = 0. Condition (13) 
for k=0 (n!%J=n%) can be transformed to the form 


~ a 
5 Os ian Y; 
a 


whence after division by 7, using the notation of (3), and also the formula of difference 
differentiation p:=—n./nn, n=1/p (p is the density), we have 


(1+0/a) th e>—1. (15) 


We note that for the case of compression :>0, and thereby condition (15) is satisfied. After 
simple transformations the inequality (14) is reduced to the form 


(1+0/a) tijpr<1. (16) 
Combining (15) and (16), we arrive at the condition 
(1+0/a) tli prllce<1. 


We note that the condition for the convergence of the iterations obtained in [2, 4] for the 
isothermal case without allowing for pseudo-viscosity has the form 


tllAllell pelle<1. 
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Condition (17) for the isothermal case y=1, a= becomes the inequality 


tll pille<4, 


which is less “strict” than (18). We note that the presence of pseudo-viscosity in the scheme has no 
effect on the stability condition (17) or on the estimate of the rate of convergence (10). It must 

be pointed out that the convergence of the iterative process (10) investigated above has the nature 
of a geometrical progression with denominator q,;, while in [2, 4] the nature of the convergence 
studied for the isothermal case is quadratic. From the inequality (17) it also follows that in the 
adiabatic case the constraint on the mesh stop is stricter than in the isothermal case. 


Translated by J. Berry. 
REFERENCES 


POPOV, Yu. P. and SAMARSKII, A. A. Completely conservative difference schemes. Zh. vychisl. Mat. mat. 
Fiz., 9, 4, 953-958, 1969. 


SAMARSKII, A. A. and POPOV, Yu. P. The difference schemes of gas dynamics (Raznostnye skhemy 
gazovoi dinamiki), ‘““Nauka”’, Moscow, 1975. 


ROZHDESTVENSKII, B. L. and YANENKO, N. N. Systems of quasilinear equations (Sistemy 
kvazilineinykh uravnenii), ‘“Nauka”’, Moscow, 1968. 


POPOV, Yu. P. and SAMARSKII, A. A. Methods for the numerical solution of one-dimensional 
non-stationary problems of gas dynamics. Zh. vychisl. Mat. mat. Fiz., 16, 6, 1503—1518, 1976. ° 


SAMARSKII, A. A. /ntroduction to the theory of difference schemes (Vvedenie v teoriyu raznostnykh 
skhem). ‘“‘Nauka”, Moscow, 1971. 





BOOK REVIEW* 


K. SARKHADI and I. VINCZE. Mathematical methods of statistical quality control. 415p. 
Akademiai Kiado, Budapest, 1974. 


Statistical control of the quality of production is one of the most important fields of 
application of the theory of probability and mathematical statistics. There are a number of books 
(including some in Russian), devoted to this field of applied mathematics, but many of them are at 
too low a mathematical level, and hence they do not satisfy the demands of modern complex 


production. 


In the book reviewed, written by two prominent Hungarian specialists, the methods of 
mathematical quality control are based on the serious foundation of the theory of probability and 
mathematical statistics. Therefore a large part of the book is devoted to the principles of these 
disciplines, without forgetting the main purpose of the book. 


Of course it will not serve as a textbook on the theory of probability and mathematical 
statistics, since it expounds the principles of these sciences in outline, without detailed proofs of 
the theorems, without a large number of training examples and problems, which are characteristic 
of every textbook. Accordingly the book is intended for a wide range of applied mathematicians, 


engineers, students of higher educational establishments and others interested in the applications 
of the theory of probability and mathematical statistics. For all these categories of readers it will 
constitute an excellent reference book. 


The book consists of three parts and an appendix. The first part is introductory. Here the 
fundamental concepts and definitions used in the following parts are explained. The second part 
occupies more than half the book. Here the fundamental theorems of probability are explained. 

In particular the fundamental types of probability distribution, the principles of the sampling 
method, theories of order statistics etc. are explained in great detail. Also explained are the 
fundamental methods of mathematical statistics: the theory of estimation, the statistical testing of 
hypotheses, the principles of correlation and regression analysis, the theory of statistical decisions, 
the principles of stochastic processes etc. The third part is devoted to methods of statistical quality 
control. Here, after presenting the fundamental ideas, detailed explanations are given of the methods 
of continuous control and acceptance control, based on the main types of control charts and on 
various standards. The last paragraph of this part is devoted to the principles of reliability theory. 


The book is supplemented by 14 basic statistical tables. Among them there are tables of 
random numbers, the normal probability density and function, the Poisson distribution, the binomial 
distribution, the percentage points of the F-distribution, Student’s distribution, the y2-distribution, 
the Kolmogorov—Smirnov distribution etc. There is an extensive bibliography, separated into books, 
papers, tables and standards, and also the necessary indexes. The printing of the book is excellent. 
The translation of this fundamental work into Russian is extremely desirable. 


M. K. Kerimov 
Translated by J. Berry. 
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